pymllm.mem_cache.base_prefix_cache

Abstract base class and shared data types for prefix cache implementations.

All concrete caches (RadixCache, ChunkCache, MambaRadixCache) inherit from BasePrefixCache and share the data classes defined here.

Classes

RadixKey

Compound lookup key: token-id sequence + optional namespace tag.

MatchResult

Returned by BasePrefixCache.match_prefix().

InsertResult

Returned by BasePrefixCache.insert().

EvictResult

Returned by BasePrefixCache.evict().

BasePrefixCache

Abstract interface for all prefix cache implementations.

Functions

hash_token_ids(token_ids[, prior_hash])

SHA-256 hash of a token-id page with optional chain-hash.

hash_to_int64(hex_str)

Convert a hex digest to a signed 64-bit integer (first 16 hex chars).

hash_bytes(data)

SHA-256 -> unsigned 64-bit int. Useful for multimodal embedding keys.

Module Contents

pymllm.mem_cache.base_prefix_cache.hash_token_ids(token_ids, prior_hash=None)

SHA-256 hash of a token-id page with optional chain-hash.

Each token is encoded as a 4-byte little-endian unsigned integer; tuples (bigram / EAGLE) hash each element in order. When prior_hash is supplied the digest is seeded with the raw bytes of the previous hash, making the result position-aware.

Parameters:
  • token_ids (List[Union[int, Tuple[int, Ellipsis]]])

  • prior_hash (Optional[str])

Return type:

str

pymllm.mem_cache.base_prefix_cache.hash_to_int64(hex_str)

Convert a hex digest to a signed 64-bit integer (first 16 hex chars).

Parameters:

hex_str (str)

Return type:

int

pymllm.mem_cache.base_prefix_cache.hash_bytes(data)

SHA-256 -> unsigned 64-bit int. Useful for multimodal embedding keys.

Parameters:

data (bytes)

Return type:

int

class pymllm.mem_cache.base_prefix_cache.RadixKey(token_ids, extra_key=None)

Compound lookup key: token-id sequence + optional namespace tag.

extra_key isolates independent namespaces so that sequences with identical leading tokens but different adapters / LoRA ids / multimodal context hashes never share prefix nodes.

Parameters:
  • token_ids (List[Union[int, Tuple[int, Ellipsis]]])

  • extra_key (Optional[str])

__slots__ = ('token_ids', 'extra_key')
token_ids
extra_key = None
__len__()
Return type:

int

__iter__()
Return type:

Iterator

__getitem__(idx)
Parameters:

idx (Union[int, slice])

Return type:

RadixKey

__repr__()
Return type:

str

class pymllm.mem_cache.base_prefix_cache.MatchResult

Returned by BasePrefixCache.match_prefix().

indices: torch.Tensor
last_node: Any = None
prefix_len: int = 0
mamba_branching_seqlen: int | None = None
class pymllm.mem_cache.base_prefix_cache.InsertResult

Returned by BasePrefixCache.insert().

prefix_len: int = 0
last_node: Any = None
mamba_exist: bool = False
class pymllm.mem_cache.base_prefix_cache.EvictResult

Returned by BasePrefixCache.evict().

full_evicted: int = 0
swa_evicted: int = 0
mamba_evicted: int = 0
class pymllm.mem_cache.base_prefix_cache.BasePrefixCache

Bases: abc.ABC

Abstract interface for all prefix cache implementations.

Concrete implementations:

  • RadixCache – radix-tree with SWA tombstone support

  • ChunkCache – no-op fallback (disable_radix_cache=True)

  • MambaRadixCache – radix-tree with independent Mamba/SSM state tracking

abstractmethod reset()

Clear all cached state and re-initialise.

Return type:

None

abstractmethod match_prefix(key)

Find the longest cached prefix of key.

Parameters:

key (RadixKey)

Return type:

MatchResult

abstractmethod insert(key, value=None, **kwargs)

Insert key/value into the cache.

Parameters:
  • key (RadixKey)

  • value (Optional[torch.Tensor])

  • kwargs (Any)

Return type:

InsertResult

abstractmethod evict(num_tokens, swa_num_tokens=0)

Evict tokens to free memory.

Parameters:
  • num_tokens (int)

  • swa_num_tokens (int)

Return type:

EvictResult

abstractmethod inc_lock_ref(node)

Lock node (and ancestors) to prevent eviction.

Returns an opaque token (e.g. swa_boundary_id) that must be passed back to dec_lock_ref().

Parameters:

node (Any)

Return type:

Optional[Any]

abstractmethod dec_lock_ref(node, **kwargs)

Unlock node (and ancestors).

Parameters:
  • node (Any)

  • kwargs (Any)

Return type:

None

evictable_size()
Return type:

int

swa_evictable_size()
Return type:

int

protected_size()
Return type:

int

swa_protected_size()
Return type:

int

total_size()
Return type:

int