Variable: TokenEncoding
ts
const TokenEncoding: readonly [
"gpt2",
"r50k_base",
"p50k_base",
"p50k_edit",
"cl100k_base",
"o200k_base",
"gemini",
"llama2",
"claude",
];The set of supported token encoding identifiers.
Remarks
Each value maps to a specific estimation backend:
gpt2,r50k_base,p50k_base,p50k_edit,cl100k_base,o200k_base— exact counts viajs-tiktoken(OpenAI / tiktoken-compatible models).gemini— exact counts via@lenml/tokenizer-gemini, which embeds Gemini's actual SentencePiece vocabulary locally with no API call required.llama2— exact counts viallama-tokenizer-js(Llama 1 and 2). Llama 3+ uses a different vocabulary and should use thellama3identifier once a suitable sync backend is available.claude— heuristic approximation using Anthropic's published ~3.5 chars/token ratio. No local tokenizer is available for Claude 3+ models; the Anthropic SDK'smessages.countTokens()API is the only exact path but requires a network call.
When adding a new encoding, add a case to Tokenizable.estimateTokens.