Skip to content
3 min read · 541 words

Class: Tokenizable

A mutable string with a built-in token counter.

Remarks

The wrapped string can be read via the standard coercion protocol and updated at any time via Tokenizable.set. Token counts are computed lazily on first access per encoding and cached until the value changes, avoiding redundant encoder invocations when the same content is measured multiple times across a pipeline.

Estimation is dispatched by encoding identifier — see TokenEncoding for the full list of supported backends and their accuracy characteristics. Unrecognised encodings fall back to a ceil(length / 4) character heuristic.

The class implements the standard JS value-coercion protocol (toString, valueOf, toJSON, toLocaleString, Symbol.for('nodejs.util.inspect.custom')) so instances behave transparently as strings in most contexts.

Constructors

Constructor

ts
new Tokenizable(value: string): Tokenizable;

Parameters

ParameterTypeDescription
valuestringThe initial string value to wrap.

Returns

Tokenizable

Properties

PropertyModifierTypeDefault valueDescription
estimateTokenspublic(encoding: | "gpt2" | "r50k_base" | "p50k_base" | "p50k_edit" | "cl100k_base" | "o200k_base" | "gemini" | "llama2" | "claude") => numberundefined-
setpublic(value: string) => voidundefined-
toJSONpublic() => stringundefined-
toLocaleStringpublic() => stringundefined-
toStringpublic() => stringundefined-
valueOfpublic() => stringundefined-
schemastaticAlternativesSchema<any>stringOrTokenizableSchemaValidator schema that accepts a plain string or a Tokenizable instance. Remarks Reusable fragment for any schema that wants to accept either form — for example, systemPrompt and each item in standingInstructions in turnContextSchema.
TokenEncodingstaticreadonly ["gpt2", "r50k_base", "p50k_base", "p50k_edit", "cl100k_base", "o200k_base", "gemini", "llama2", "claude"]TokenEncoding-

Methods

estimateTokens()

ts
static estimateTokens(value: string, encoding:
  | "gpt2"
  | "r50k_base"
  | "p50k_base"
  | "p50k_edit"
  | "cl100k_base"
  | "o200k_base"
  | "gemini"
  | "llama2"
  | "claude"): number;

Convenience overload for one-off token counting without managing a Tokenizable instance.

Parameters

ParameterTypeDescription
valuestringThe string to count tokens for.
encoding| "gpt2" | "r50k_base" | "p50k_base" | "p50k_edit" | "cl100k_base" | "o200k_base" | "gemini" | "llama2" | "claude"The encoding identifier to use for counting.

Returns

number

The estimated number of tokens.

Remarks

Creates a temporary instance and immediately discards it — no caching benefit. Use the instance method when you need to count the same value under multiple encodings or when the value may change over time.


isTokenizable()

ts
static isTokenizable(value: unknown): value is Tokenizable;

Returns true if value is a Tokenizable instance.

Parameters

ParameterTypeDescription
valueunknownThe value to test.

Returns

value is Tokenizable

true when value is a Tokenizable instance.

Remarks

Uses @nhtio/adk!isInstanceOf for cross-realm safety — instanceof would fail for instances created in a different module copy or VM context.