pymllm.mem_cache.chunk_cache¶

No-op prefix cache used when disable_radix_cache=True.

Every request is fully computed from scratch – no prefix sharing, no tree structure, no eviction logic. This is the simplest possible BasePrefixCache implementation.

Classes¶

ChunkCache

No-op prefix cache: no prefix sharing, no eviction.

Module Contents¶

class pymllm.mem_cache.chunk_cache.ChunkCache(token_to_kv_pool_allocator=None, device=torch.device('cpu'))¶

Bases: pymllm.mem_cache.base_prefix_cache.BasePrefixCache

No-op prefix cache: no prefix sharing, no eviction.

When the radix cache is disabled, this class replaces it so that the rest of the system can call the same interface without branching.

Parameters:

token_to_kv_pool_allocator (Any) – Pool allocator used to free KV indices on request completion.
device (torch.device) – Device for empty tensors returned by match_prefix().

pool = None¶

device¶

reset()¶

Clear all cached state and re-initialise.

Return type:: None

match_prefix(key)¶

Always returns an empty match (no prefix sharing).

Parameters:: key (pymllm.mem_cache.base_prefix_cache.RadixKey)
Return type:: pymllm.mem_cache.base_prefix_cache.MatchResult

insert(key, value=None, **kwargs)¶

No-op: nothing is cached.

Parameters:

key (pymllm.mem_cache.base_prefix_cache.RadixKey)
value (Optional[torch.Tensor])
kwargs (Any)

Return type:

pymllm.mem_cache.base_prefix_cache.InsertResult

evict(num_tokens, swa_num_tokens=0)¶

No-op: nothing to evict.

Parameters:

num_tokens (int)
swa_num_tokens (int)

Return type:

pymllm.mem_cache.base_prefix_cache.EvictResult

inc_lock_ref(node)¶

No-op: nothing to lock.

Parameters:: node (Any)
Return type:: Optional[Any]

dec_lock_ref(node, **kwargs)¶

No-op: nothing to unlock.

Parameters:

node (Any)
kwargs (Any)

Return type:

None