pymllm.mem_cache.base_prefix_cache¶
Abstract base class and shared data types for prefix cache implementations.
All concrete caches (RadixCache, ChunkCache,
MambaRadixCache) inherit from BasePrefixCache and share
the data classes defined here.
Classes¶
Compound lookup key: token-id sequence + optional namespace tag. |
|
Returned by |
|
Returned by |
|
Returned by |
|
Abstract interface for all prefix cache implementations. |
Functions¶
|
SHA-256 hash of a token-id page with optional chain-hash. |
|
Convert a hex digest to a signed 64-bit integer (first 16 hex chars). |
|
SHA-256 -> unsigned 64-bit int. Useful for multimodal embedding keys. |
Module Contents¶
- pymllm.mem_cache.base_prefix_cache.hash_token_ids(token_ids, prior_hash=None)¶
SHA-256 hash of a token-id page with optional chain-hash.
Each token is encoded as a 4-byte little-endian unsigned integer; tuples (bigram / EAGLE) hash each element in order. When prior_hash is supplied the digest is seeded with the raw bytes of the previous hash, making the result position-aware.
- Parameters:
token_ids (List[Union[int, Tuple[int, Ellipsis]]])
prior_hash (Optional[str])
- Return type:
str
- pymllm.mem_cache.base_prefix_cache.hash_to_int64(hex_str)¶
Convert a hex digest to a signed 64-bit integer (first 16 hex chars).
- Parameters:
hex_str (str)
- Return type:
int
- pymllm.mem_cache.base_prefix_cache.hash_bytes(data)¶
SHA-256 -> unsigned 64-bit int. Useful for multimodal embedding keys.
- Parameters:
data (bytes)
- Return type:
int
- class pymllm.mem_cache.base_prefix_cache.RadixKey(token_ids, extra_key=None)¶
Compound lookup key: token-id sequence + optional namespace tag.
extra_keyisolates independent namespaces so that sequences with identical leading tokens but different adapters / LoRA ids / multimodal context hashes never share prefix nodes.- Parameters:
token_ids (List[Union[int, Tuple[int, Ellipsis]]])
extra_key (Optional[str])
- __slots__ = ('token_ids', 'extra_key')¶
- token_ids¶
- extra_key = None¶
- __len__()¶
- Return type:
int
- __iter__()¶
- Return type:
Iterator
- __repr__()¶
- Return type:
str
- class pymllm.mem_cache.base_prefix_cache.MatchResult¶
Returned by
BasePrefixCache.match_prefix().- indices: torch.Tensor¶
- last_node: Any = None¶
- prefix_len: int = 0¶
- mamba_branching_seqlen: int | None = None¶
- class pymllm.mem_cache.base_prefix_cache.InsertResult¶
Returned by
BasePrefixCache.insert().- prefix_len: int = 0¶
- last_node: Any = None¶
- mamba_exist: bool = False¶
- class pymllm.mem_cache.base_prefix_cache.EvictResult¶
Returned by
BasePrefixCache.evict().- full_evicted: int = 0¶
- swa_evicted: int = 0¶
- mamba_evicted: int = 0¶
- class pymllm.mem_cache.base_prefix_cache.BasePrefixCache¶
Bases:
abc.ABCAbstract interface for all prefix cache implementations.
Concrete implementations:
RadixCache– radix-tree with SWA tombstone supportChunkCache– no-op fallback (disable_radix_cache=True)MambaRadixCache– radix-tree with independent Mamba/SSM state tracking
- abstractmethod reset()¶
Clear all cached state and re-initialise.
- Return type:
None
- abstractmethod match_prefix(key)¶
Find the longest cached prefix of key.
- Parameters:
key (RadixKey)
- Return type:
- abstractmethod insert(key, value=None, **kwargs)¶
Insert key/value into the cache.
- Parameters:
key (RadixKey)
value (Optional[torch.Tensor])
kwargs (Any)
- Return type:
- abstractmethod evict(num_tokens, swa_num_tokens=0)¶
Evict tokens to free memory.
- Parameters:
num_tokens (int)
swa_num_tokens (int)
- Return type:
- abstractmethod inc_lock_ref(node)¶
Lock node (and ancestors) to prevent eviction.
Returns an opaque token (e.g.
swa_boundary_id) that must be passed back todec_lock_ref().- Parameters:
node (Any)
- Return type:
Optional[Any]
- abstractmethod dec_lock_ref(node, **kwargs)¶
Unlock node (and ancestors).
- Parameters:
node (Any)
kwargs (Any)
- Return type:
None
- evictable_size()¶
- Return type:
int
- swa_evictable_size()¶
- Return type:
int
- protected_size()¶
- Return type:
int
- swa_protected_size()¶
- Return type:
int
- total_size()¶
- Return type:
int