pymllm.mem_cache.base_prefix_cache¶

Abstract base class and shared data types for prefix cache implementations.

All concrete caches (RadixCache, ChunkCache, MambaRadixCache) inherit from BasePrefixCache and share the data classes defined here.

Classes¶

`RadixKey`	Compound lookup key: token-id sequence + optional namespace tag.
`MatchResult`	Returned by `BasePrefixCache.match_prefix()`.
`InsertResult`	Returned by `BasePrefixCache.insert()`.
`EvictResult`	Returned by `BasePrefixCache.evict()`.
`BasePrefixCache`	Abstract interface for all prefix cache implementations.

Functions¶

`hash_token_ids`(token_ids[, prior_hash])	SHA-256 hash of a token-id page with optional chain-hash.
`hash_to_int64`(hex_str)	Convert a hex digest to a signed 64-bit integer (first 16 hex chars).
`hash_bytes`(data)	SHA-256 -> unsigned 64-bit int. Useful for multimodal embedding keys.

Module Contents¶

pymllm.mem_cache.base_prefix_cache.hash_token_ids(token_ids, prior_hash=None)¶

SHA-256 hash of a token-id page with optional chain-hash.

Each token is encoded as a 4-byte little-endian unsigned integer; tuples (bigram / EAGLE) hash each element in order. When prior_hash is supplied the digest is seeded with the raw bytes of the previous hash, making the result position-aware.

Parameters:

token_ids (List[Union[int, Tuple[int, Ellipsis]]])
prior_hash (Optional[str])

Return type:

str

pymllm.mem_cache.base_prefix_cache.hash_to_int64(hex_str)¶

Convert a hex digest to a signed 64-bit integer (first 16 hex chars).

Parameters:: hex_str (str)
Return type:: int

pymllm.mem_cache.base_prefix_cache.hash_bytes(data)¶

SHA-256 -> unsigned 64-bit int. Useful for multimodal embedding keys.

Parameters:: data (bytes)
Return type:: int

class pymllm.mem_cache.base_prefix_cache.RadixKey(token_ids, extra_key=None)¶

Compound lookup key: token-id sequence + optional namespace tag.

extra_key isolates independent namespaces so that sequences with identical leading tokens but different adapters / LoRA ids / multimodal context hashes never share prefix nodes.

Parameters:

token_ids (List[Union[int, Tuple[int, Ellipsis]]])
extra_key (Optional[str])

__slots__ = ('token_ids', 'extra_key')¶

token_ids¶

extra_key = None¶

__len__()¶

Return type:: int

__iter__()¶

Return type:: Iterator

__getitem__(idx)¶

Parameters:: idx (Union[int, slice])
Return type:: RadixKey

__repr__()¶

Return type:: str

class pymllm.mem_cache.base_prefix_cache.MatchResult¶

Returned by BasePrefixCache.match_prefix().

indices: torch.Tensor¶

last_node: Any = None¶

prefix_len: int = 0¶

mamba_branching_seqlen: int | None = None¶

class pymllm.mem_cache.base_prefix_cache.InsertResult¶

Returned by BasePrefixCache.insert().

prefix_len: int = 0¶

last_node: Any = None¶

mamba_exist: bool = False¶

class pymllm.mem_cache.base_prefix_cache.EvictResult¶

Returned by BasePrefixCache.evict().

full_evicted: int = 0¶

swa_evicted: int = 0¶

mamba_evicted: int = 0¶

class pymllm.mem_cache.base_prefix_cache.BasePrefixCache¶

Bases: abc.ABC

Abstract interface for all prefix cache implementations.

Concrete implementations:

RadixCache – radix-tree with SWA tombstone support
ChunkCache – no-op fallback (disable_radix_cache=True)
MambaRadixCache – radix-tree with independent Mamba/SSM state tracking

abstractmethod reset()¶

Clear all cached state and re-initialise.

Return type:: None

abstractmethod match_prefix(key)¶

Find the longest cached prefix of key.

Parameters:: key (RadixKey)
Return type:: MatchResult

abstractmethod insert(key, value=None, **kwargs)¶

Insert key/value into the cache.

Parameters:

key (RadixKey)
value (Optional[torch.Tensor])
kwargs (Any)

Return type:

InsertResult

abstractmethod evict(num_tokens, swa_num_tokens=0)¶

Evict tokens to free memory.

Parameters:

num_tokens (int)
swa_num_tokens (int)

Return type:

EvictResult

abstractmethod inc_lock_ref(node)¶

Lock node (and ancestors) to prevent eviction.

Returns an opaque token (e.g. swa_boundary_id) that must be passed back to dec_lock_ref().

Parameters:: node (Any)
Return type:: Optional[Any]

abstractmethod dec_lock_ref(node, **kwargs)¶

Unlock node (and ancestors).

Parameters:

node (Any)
kwargs (Any)

Return type:

None

evictable_size()¶

Return type:: int

swa_evictable_size()¶

Return type:: int

protected_size()¶

Return type:: int

swa_protected_size()¶

Return type:: int

total_size()¶

Return type:: int