Skip to content
2 min read · 394 words

@nhtio/adk/batteries/llm/webllm_chat_completions/adapter

Cross-environment executor adapter for WebLLM Chat Completions compatible endpoints.

Remarks

Cross-environment LLM adapter for the WebLLM Chat Completions wire shape. Chat Completions was chosen as the ADK's reference adapter because it is the de-facto interchange format for the majority of OpenAI-compatible gateways (vLLM, Together, Groq, Fireworks, Ollama, Azure OpenAI, OpenRouter, DeepSeek, Mistral La Plateforme, and most self-hosted deployments). Its tool-call synthetic-history shape (role: 'assistant', tool_calls: [...] followed by role: 'tool' with tool_call_id) is the lowest-common-denominator that every conformant gateway accepts.

The adapter is built around three pluggable layers:

  1. Translation helpers — the thirteen swappable functions exported from ./helpers turn ADK primitives (@nhtio/adk!Tokenizable, @nhtio/adk!Memory, @nhtio/adk!Message, @nhtio/adk!Thought, @nhtio/adk!ToolCall, @nhtio/adk!Tool, @nhtio/adk!ArtifactTool, @nhtio/adk!SpooledArtifact) into Chat Completions wire shapes. Consumers override individual helpers via options.helpers.* to customise envelope formats, bucket ordering, thought surfacing, or JSON Schema generation without forking the adapter.
  2. Three-layer options merging — constructor baseline, per-executor() overrides, and per-iteration ctx.stash.webLLMChatCompletions overrides combine with key-by-key precedence for helpers and wholesale replacement for everything else. The merged shape is re-validated on every iteration so a malformed stash override fails loud, not silently.
  3. WebLLM engine invocation — accepts a preloaded engine or lazy createEngine factory. The resolved request body is passed directly to WebLLM's OpenAI-compatible chat API.

Per-iteration flow (steps 1–9 of the plan):

  1. Merge constructor / executor / stash options and re-validate.
  2. Resolve helpers, falling back to bundled default* for each unset field.
  3. Forge artifact-query tools by walking ctx.turnToolCalls, collecting unique SpooledArtifact constructors, calling <Ctor>.forgeTools(ctx) on each, and merging the results with ctx.tools.
  4. Pre-render every persisted tool-call result into the prompt-ready string the timeline will use, cached by tc.id.
  5. When tokenEncoding !== null, sum the token weight of every persisted bucket and throw @nhtio/adk/batteries!E_WEBLLM_CHAT_COMPLETIONS_CONTEXT_OVERFLOW when the total exceeds contextWindow.
  6. Build the request body via buildChatCompletionsHistory; carry vendor-opaque reasoning blocks through the _adk_reasoning_payloads side-channel.
  7. Resolve or lazily create a WebLLM engine and call engine.chat.completions.create(body).
  8. Streaming path: consume WebLLM's async chunk iterable; surface deltas through helpers.reportMessage / reportThought / reportToolCall; assemble tool-call deltas via the accumulator; persist Message / Thought / ToolCall records on stream end.
  9. Non-streaming path: consume the returned Chat Completion object; same persistence + tool-execution loop.

Classes

ClassDescription
WebLLMChatCompletionsAdapterOpinionated cross-environment LLM adapter for the WebLLM Chat Completions wire shape.