pymllm.mem_cache.chunk_cache¶
No-op prefix cache used when disable_radix_cache=True.
Every request is fully computed from scratch – no prefix sharing, no
tree structure, no eviction logic. This is the simplest possible
BasePrefixCache implementation.
Classes¶
No-op prefix cache: no prefix sharing, no eviction. |
Module Contents¶
- class pymllm.mem_cache.chunk_cache.ChunkCache(token_to_kv_pool_allocator=None, device=torch.device('cpu'))¶
Bases:
pymllm.mem_cache.base_prefix_cache.BasePrefixCacheNo-op prefix cache: no prefix sharing, no eviction.
When the radix cache is disabled, this class replaces it so that the rest of the system can call the same interface without branching.
- Parameters:
token_to_kv_pool_allocator (Any) – Pool allocator used to free KV indices on request completion.
device (torch.device) – Device for empty tensors returned by
match_prefix().
- pool = None¶
- device¶
- reset()¶
Clear all cached state and re-initialise.
- Return type:
None
- match_prefix(key)¶
Always returns an empty match (no prefix sharing).
- Parameters:
- Return type:
- insert(key, value=None, **kwargs)¶
No-op: nothing is cached.
- Parameters:
value (Optional[torch.Tensor])
kwargs (Any)
- Return type:
- evict(num_tokens, swa_num_tokens=0)¶
No-op: nothing to evict.
- Parameters:
num_tokens (int)
swa_num_tokens (int)
- Return type:
- inc_lock_ref(node)¶
No-op: nothing to lock.
- Parameters:
node (Any)
- Return type:
Optional[Any]
- dec_lock_ref(node, **kwargs)¶
No-op: nothing to unlock.
- Parameters:
node (Any)
kwargs (Any)
- Return type:
None