---
url: 'https://adk.nht.io/assembly/batteries-llm.md'
description: OpenAI-compatible and WebLLM executors that you wire in one line.
---

# LLM batteries

## LLM summary — LLM batteries

* `@nhtio/adk/batteries/llm/openai_chat_completions` ships the `OpenAIChatCompletionsAdapter` class.
* `@nhtio/adk/batteries/llm/webllm_chat_completions` ships the `WebLLMChatCompletionsAdapter` class for browser-native local model execution.
* They satisfy the `DispatchExecutorFn` contract. A battery is a complete executor.
* Compatible with any endpoint speaking the OpenAI Chat Completions wire shape (cloud APIs, self-hosted servers, proxy gateways).
* Constructor: `new OpenAIChatCompletionsAdapter(options)` validates baseline options at construction time and throws `E_INVALID_OPENAI_CHAT_COMPLETIONS_OPTIONS` on bad config.
* Per-iteration validation also occurs during dispatch.
* Required constructor field: `model: string`.
* Optional ADK-control fields: `apiKey`, `baseURL`, `headers`, `stream`, `streamIdleTimeoutMs`, `requestTimeoutMs`, `retry`, `fetch`, `bucketOrder`, `contextWindow`, `selfIdentity`, `thoughtSurfacing`, `tokenEncoding`, `replayCompatibility`, `reasoningFieldPrecedence`, `helpers`, `strictToolChoice`, `unsupportedMediaPolicy`.
* `tokenEncoding` is `TokenEncoding | null`, not `string`.
* `thoughtSurfacing` supports `'all-self'`, `'latest-self'`, and `'all'`.
* Three-layer options merging: constructor baseline -> executor overrides -> per-iteration stash overrides.
* `ctx.stash` is a `Registry` instance — use `ctx.stash.set()` and `ctx.stash.get()`, not bracket access.
* Per-iteration stash override: call `ctx.stash.set(OpenAIChatCompletionsAdapter.STASH_KEY, ...)` in `dispatchInputPipeline` middleware; the adapter reads it with `ctx.stash.get(OpenAIChatCompletionsAdapter.STASH_KEY, {})`.
* `helpers` override (`Partial<ChatCompletionsHelpers>`): 18 pluggable translation functions including `renderUntrustedContent`, `renderTrustedContent`, `renderStandingInstructions` (handling `Iterable<Tokenizable>`), `renderMemories`, and `renderRetrievables` and sub-renderers.
* The adapter handles `SpooledArtifact.forgeTools()` internally—you do not need to call `forgeTools` or bind context for local forged tools.

Writing custom HTTP fetch blocks, manual Server-Sent Events (SSE) streaming loops, and retry logic by hand is usually not the interesting part of your agent. Use the batteries unless you are deliberately replacing the execution loop.

```typescript
executorCallback: new OpenAIChatCompletionsAdapter({ model, apiKey, autoAck: true }).executor()
```

That is it. That single line resolves the entire [`DispatchExecutorFn`](https://adk.nht.io/api/@nhtio/adk/dispatch_runner/type-aliases/DispatchExecutorFn) interface. `autoAck: true` tells the executor to call `ctx.ack()` automatically after a tool-call-free response; the default is `false`, meaning the implementor owns turn completion.

ADK ships two LLM batteries: [`OpenAIChatCompletionsAdapter`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/openai_chat_completions/adapter/classes/OpenAIChatCompletionsAdapter) and [`WebLLMChatCompletionsAdapter`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/webllm_chat_completions/adapter/classes/WebLLMChatCompletionsAdapter). They satisfy [`DispatchExecutorFn`](https://adk.nht.io/api/@nhtio/adk/dispatch_runner/type-aliases/DispatchExecutorFn) directly.

They handle SSE streaming, token math, safety envelopes, tool call dispatching, artifact forging, and transient error recovery. They are not small convenience helpers. They are executor implementations.

## Compatible Endpoints

### OpenAIChatCompletionsAdapter

The adapter works against any endpoint speaking the OpenAI Chat Completions wire format:

* **Cloud model APIs** — anything that natively speaks this wire shape.
* **Self-hosted inference servers** — any server that exposes a Chat Completions-compatible HTTP interface.
* **Proxy gateways and routing layers** that expose a standard `/v1/chat/completions` interface.

Point `baseURL` at your endpoint. The adapter sends standard HTTP. It does not care what sits behind it.

### WebLLMChatCompletionsAdapter

Runs models locally in the browser or supported JS runtimes via WebGPU. Use it for local-first or zero-server deployment models. It accepts [`WebLLMChatCompletionsAdapterOptions`](https://adk.nht.io/api/@nhtio/adk/batteries/interfaces/WebLLMChatCompletionsAdapterOptions) to configure loading and cache policies.

## Construction and Validation

The constructor validates baseline options immediately on startup. Config bugs fail loud and fast. If you pass junk into [`OpenAIChatCompletionsAdapter`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/openai_chat_completions/adapter/classes/OpenAIChatCompletionsAdapter), it throws [`E_INVALID_OPENAI_CHAT_COMPLETIONS_OPTIONS`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/openai_chat_completions/exceptions/variables/E_INVALID_OPENAI_CHAT_COMPLETIONS_OPTIONS) right away. If you pass junk into [`WebLLMChatCompletionsAdapter`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/webllm_chat_completions/adapter/classes/WebLLMChatCompletionsAdapter), it throws [`E_INVALID_WEBLLM_CHAT_COMPLETIONS_OPTIONS`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/webllm_chat_completions/exceptions/variables/E_INVALID_WEBLLM_CHAT_COMPLETIONS_OPTIONS) right away. Merged executor and stash overrides are revalidated at dispatch time.

```typescript
import { OpenAIChatCompletionsAdapter } from '@nhtio/adk/batteries/llm/openai_chat_completions'

const adapter = new OpenAIChatCompletionsAdapter({
  model: process.env.MODEL_ID!,
  apiKey: process.env.API_KEY,
})
```

`model` is the only strictly required field. Everything else is optional; some fields have runtime defaults.

::: danger Validation on Overrides
Bypassing the constructor does not bypass validation. If you inject malformed config into executor overrides or the iteration stash, [`OpenAIChatCompletionsAdapter`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/openai_chat_completions/adapter/classes/OpenAIChatCompletionsAdapter) will throw [`E_INVALID_OPENAI_CHAT_COMPLETIONS_OPTIONS`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/openai_chat_completions/exceptions/variables/E_INVALID_OPENAI_CHAT_COMPLETIONS_OPTIONS) and [`WebLLMChatCompletionsAdapter`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/webllm_chat_completions/adapter/classes/WebLLMChatCompletionsAdapter) will throw [`E_INVALID_WEBLLM_CHAT_COMPLETIONS_OPTIONS`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/webllm_chat_completions/exceptions/variables/E_INVALID_WEBLLM_CHAT_COMPLETIONS_OPTIONS) at dispatch time.
:::

## Three-Layer Options Merging

The adapter merges configuration from three sources at each iteration:

::: code-group

```typescript [1. Constructor Baseline]
// Lowest precedence - the global fallback config
const adapter = new OpenAIChatCompletionsAdapter({
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY,
  temperature: 0.7,
  autoAck: true,
})
```

```typescript [2. Executor Overrides]
// Mid precedence - applies to every turn run by this TurnRunner
const runner = new TurnRunner({
  ...storageAdapter,
  executorCallback: adapter.executor({
    temperature: 0.2, // Overrides 0.7 constructor baseline
    max_completion_tokens: 1024,
  }),
})
```

```typescript [3. Stash Overrides]
// Highest precedence - dynamic adjustments for a single iteration
const costControlMiddleware: DispatchPipelineMiddlewareFn = async (ctx, next) => {
  ctx.stash.set(OpenAIChatCompletionsAdapter.STASH_KEY, {
    model: 'gpt-4o-mini', // Downgrade model dynamically
    temperature: 0.0,
  })
  await next()
}
```

:::

::: danger Bracket Access Mismatch
`ctx.stash` is a `Registry` instance, not a plain object. Bracket assignment like `ctx.stash[STASH_KEY] = ...` will not type-check, and the adapter reads only via `.get()`. Use `ctx.stash.set(OpenAIChatCompletionsAdapter.STASH_KEY, ...)`.
:::

### Merging Rules

* For `headers`, `helpers`, and `retry`: layers are merged key-by-key. A stash override that sets one custom header does not clear the headers defined in your constructor.
* For all other fields: the highest-precedence layer with a defined value completely replaces lower-precedence configurations.

## ADK Control Fields

These fields configure the adapter's runtime behavior:

| Field | Type | Purpose |
| :--- | :--- | :--- |
| `model` | `string` | **Required.** Model identifier passed to the model endpoint. |
| `apiKey` | `string` | Bearer token for endpoint authentication. |
| `baseURL` | `string` | Endpoint URL. Defaults to `https://api.openai.com/v1`. |
| `headers` | `Record<string, string>` | Custom HTTP headers sent with every request. |
| `stream` | `boolean` | Toggles SSE streaming. Default `true`. |
| `streamIdleTimeoutMs` | `number` | Aborts request if the stream goes silent for this period. |
| `requestTimeoutMs` | `number` | Absolute timeout limit for the entire HTTP transaction. |
| `retry` | `ChatCompletionsRetryConfig` | Custom retry configuration for handling transient errors. |
| `fetch` | `typeof globalThis.fetch` | Custom HTTP fetch engine. |
| `contextWindow` | `number` | Total context budget; the adapter throws if this threshold is crossed. |
| `tokenEncoding` | `TokenEncoding \| null` | Token encoding used for local context calculations. Non-null requires `contextWindow`. |
| `selfIdentity` | `string` | Identifies the model for cleaning up raw reasoning traces. |
| `thoughtSurfacing` | `'all-self' \| 'latest-self' \| 'all'` | Controls which persisted thoughts are replayed into model history. |
| `replayCompatibility` | `ReadonlyArray<string>` | Forwards reasoning steps to compatibility-constrained endpoints. |
| `reasoningFieldPrecedence` | `ReadonlyArray<'reasoning' \| 'reasoning_content'>` | Order in which the adapter reads provider reasoning fields; on disagreement each surfaces as its own thought. Default `['reasoning', 'reasoning_content']`. |
| `bucketOrder` | `ChatCompletionsBucketOrder` | Sets the sorting order for system prompt segments. |
| `helpers` | `Partial<ChatCompletionsHelpers>` | Overrides specific translation steps. |
| `autoAck` | `boolean` | Automatically calls `ctx.ack()` after a tool-call-free response. Default `false`. |
| `strictToolChoice` | `boolean` | Halts execution if `tool_choice` demands an ephemeral artifact tool. Default `false`. |
| `unsupportedMediaPolicy` | `string` | Strategy when media inputs are incompatible with model modalities. Default `'throw'`. |

### autoAck

`autoAck` defaults to `false`. When `false`, the executor stores the assistant message and reports it, but does not call `ctx.ack()` — turn completion is the implementor's responsibility. This is the right default: auto-acking seizes turn-completion control from the output pipeline and prevents any quality gate (output filter, confidence check, human-in-the-loop approval) from running before the turn is declared done.

Set `autoAck: true` when you are building a single-shot executor with no output-side gate and you want the executor to own the full lifecycle. Every example in this page that wires an adapter directly into a `TurnRunner` sets `autoAck: true` so the turn ends after the first tool-call-free response. If you are building a pipeline that gates on output content, omit `autoAck` and call `ctx.ack()` yourself after your gate passes.

### reasoningFieldPrecedence

Model reasoning/thinking output is **not part of OpenAI's official Chat Completions spec** — OpenAI hides it on Chat Completions and surfaces it only on the Responses API. OpenAI-compatible providers therefore invented their own field names and disagree: Ollama's `/v1` and current vLLM emit `reasoning`, while legacy vLLM (≤0.8) and the DeepSeek API emit `reasoning_content`. The adapter reads **every** field named in `reasoningFieldPrecedence` that is present on the response, so a thinking model surfaces thoughts regardless of which convention its endpoint follows.

`reasoningFieldPrecedence` defaults to `['reasoning', 'reasoning_content']` (reasoning-first, matching Ollama and current vLLM). Precedence governs two things:

* When more than one listed field is present with **identical** content (or only one is present), a single thought is emitted, attributed to the highest-precedence field — this is the common case.
* When listed fields are present with **divergent** content, each surfaces as its **own** thought (ordered by precedence) rather than silently dropping one. In streaming mode both fields stream live as separate thought streams and are de-duplicated by content only at persistence time.

Reorder the array to prefer `reasoning_content` (e.g. `['reasoning_content', 'reasoning']`), or pass a single-element array to read exactly one field.

### Model Request Body Fields

Schema-supported request body fields not explicitly defined in the ADK control group are forwarded in the JSON request body payload:

```typescript
const adapter = new OpenAIChatCompletionsAdapter({
  model: 'gpt-4o',
  temperature: 0.7,
  max_completion_tokens: 2048,
  response_format: { type: 'json_object' },
  reasoning_effort: 'high',
  seed: 42,
})
```

Supported fields include: `temperature`, `top_p`, `max_tokens`, `max_completion_tokens`, `stop`, `seed`, `presence_penalty`, `frequency_penalty`, `logit_bias`, `logprobs`, `top_logprobs`, `n`, `parallel_tool_calls`, `tool_choice`, `response_format`, `reasoning_effort`, `service_tier`, `store`, `metadata`, and `user`.

## Automatic Tool Forging

The Chat Completions adapter handles [`SpooledArtifact`](https://adk.nht.io/api/@nhtio/adk/spooled_artifact/classes/SpooledArtifact) tool forging internally — it calls `SpooledArtifact.forgeTools()` for you.

Manual `.bindContext()` plumbing in your pipelines is unnecessary for local iteration-scope tools. The adapter merges via [`ToolRegistry`](https://adk.nht.io/api/@nhtio/adk/forge/classes/ToolRegistry) — `ToolRegistry.merge([ctx.tools, ...forged], { onCollision: 'replace' })` — dynamically during each dispatch iteration, then calls `mergedRegistry.bindContext(ctx)`.

## Overriding Translation Helpers

The adapter uses 18 translation hooks defined under [`ChatCompletionsHelpers`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/openai_chat_completions/types/interfaces/ChatCompletionsHelpers) to format core ADK types into standard Chat Completions message payloads. You do not need to rewrite all 18 from scratch; pass the specific fields you want to override via `options.helpers`.

```typescript
const adapter = new OpenAIChatCompletionsAdapter({
  model: 'gpt-4o',
  autoAck: true,
  helpers: {
    renderStandingInstructions: (items) => {
      // items is Iterable<Tokenizable>
      return Array.from(items, (item) => `[INSTRUCTION]: ${String(item)}`).join('\n')
    },
    renderUntrustedContent: (content, attrs) => {
      return `[UNTRUSTED DATA id=${attrs.nonce}]\n${content}\n[END UNTRUSTED]`
    },
  },
})
```

The translation interface functions:

| Helper Hook | Purpose |
| :--- | :--- |
| `renderUntrustedContent` | Fences third-party content using randomized nonces. |
| `renderTrustedContent` | Formats safe, first-party content blocks. |
| `renderStandingInstructions` | Compiles `Iterable<Tokenizable>` into a system prompt section. |
| `renderMemories` | Translates `Iterable<{ memory: Memory; attrs: MemoryAttrs }>` loaded memory records. |
| `renderRetrievableSafetyDirective` | Prepends instructions alerting the model to retrieval content boundaries. |
| `renderFirstPartyRetrievables` | Formats safe `Iterable<{ retrievable: Retrievable; attrs: RetrievableAttrs }>` records. |
| `renderThirdPartyPublicRetrievables` | Formats untrusted public search indexing records. |
| `renderThirdPartyPrivateRetrievables` | Formats restricted third-party data extractions. |
| `renderRetrievables` | Top-level dispatcher orchestrating the safe rendering of all retrievals. |
| `renderTimelineMessage` | Translates a single [`Message`](https://adk.nht.io/api/@nhtio/adk/common/classes/Message) timeline record. |
| `renderThought` | Encapsulates model-generated chain-of-thought metadata. |
| `filterThoughts` | Truncates or selects thoughts according to the configured `thoughtSurfacing` policy. |
| `toolsToChatCompletionsTools` | Formats ADK [`Tool`](https://adk.nht.io/api/@nhtio/adk/forge/classes/Tool) instances into API tool declarations. |
| `renderChatCompletionsSystemPrompt` | Concatenates all context blocks into the final primary system instructions. |
| `renderChatCompletionsToolCallResult` | Tool result → tool message content. |
| `descriptionToChatCompletionsJsonSchema` | Maps ADK type descriptions down to strict JSON schemas. |
| `buildChatCompletionsHistory` | Constructs the absolute request message list combining history, memories, system prompts, and tool sequences. |
| `createChatCompletionsToolCallDeltaAccumulator` | Manages streaming string accumulation for building completed tool structures. |

## The Battery as Reference Implementation

If you are determined to write a custom executor, study the [`OpenAIChatCompletionsAdapter`](https://adk.nht.io/api/@nhtio/adk/batteries/llm/openai_chat_completions/adapter/classes/OpenAIChatCompletionsAdapter) source first. It is the broadest execution loop in the codebase. Pay specific attention to:

* How configuration layers are merged securely and validated before calling the model.
* How context components (`ctx.turnMessages`, `ctx.turnMemories`, `ctx.turnRetrievables`, and `ctx.tools`) are merged dynamically.
* How SSE chunks are parsed, and how `streamIdleTimeoutMs` prevents silent hangs.
* How the executor reports messages, thoughts, and tool calls via `DispatchExecutorHelpers`.
* How the system ensures `ctx.ack()` and `ctx.nack()` are executed deterministically, especially when requests fail.
