Variable: TokenEncoding

const TokenEncoding: readonly [
  "gpt2",
  "r50k_base",
  "p50k_base",
  "p50k_edit",
  "cl100k_base",
  "o200k_base",
  "gemini",
  "gemma",
  "llama2",
  "claude",
];

Defined in: src/lib/classes/tokenizable.ts:58

The set of supported token encoding identifiers.

Remarks

Each value maps to a specific estimation backend:

gpt2, r50k_base, p50k_base, p50k_edit, cl100k_base, o200k_base — exact counts via js-tiktoken (OpenAI / tiktoken-compatible models).
gemini — exact counts via @lenml/tokenizer-gemini, which embeds Gemini's actual SentencePiece vocabulary locally with no API call required.
gemma — exact counts for Google's Gemma models (Gemma 2/3/4, incl. the on-device .litertlm / ONNX builds). Backed by the SAME @lenml/tokenizer-gemini package, whose bundled tokenizer_config.json declares "tokenizer_class": "GemmaTokenizer" over the shared 256k-vocab SentencePiece tokenizer — Gemini and Gemma share it, and it encodes Gemma's control tokens (<start_of_turn>, <end_of_turn>, <eos>, …) as single ids. Deliberate reuse, not a proxy: no extra dependency. Distinct identifier so callers can say what model they mean.
llama2 — exact counts via llama-tokenizer-js (Llama 1 and 2). Llama 3+ uses a different vocabulary and should use the llama3 identifier once a suitable sync backend is available.
claude — heuristic approximation using Anthropic's published ~3.5 chars/token ratio. No local tokenizer is available for Claude 3+ models; the Anthropic SDK's messages.countTokens() API is the only exact path but requires a network call.

This array is the CANONICAL, closed set of backends built into core — adding one of these requires editing core (add a case to Tokenizable.estimateTokens's internal switch). For every OTHER encoding a battery or consumer wants to measure (a model-specific tokenizer core has no business knowing about), call registerTokenEstimator instead — no core edit required. See TokenEncodingId for the widened identifier type that accepts both.

What each pipeline owns

Envelopes

Persistence

Identity and Reasoning

Media

Variable: TokenEncoding

Remarks

What each pipeline owns

Envelopes

Persistence

Identity and Reasoning

Media

Variable: TokenEncoding ​

Remarks ​

Variable: TokenEncoding

Remarks