pymllm.mem_cache.chunk_cache

No-op prefix cache used when disable_radix_cache=True.

Every request is fully computed from scratch – no prefix sharing, no tree structure, no eviction logic. This is the simplest possible BasePrefixCache implementation.

Classes

ChunkCache

No-op prefix cache: no prefix sharing, no eviction.

Module Contents

class pymllm.mem_cache.chunk_cache.ChunkCache(token_to_kv_pool_allocator=None, device=torch.device('cpu'))

Bases: pymllm.mem_cache.base_prefix_cache.BasePrefixCache

No-op prefix cache: no prefix sharing, no eviction.

When the radix cache is disabled, this class replaces it so that the rest of the system can call the same interface without branching.

Parameters:
  • token_to_kv_pool_allocator (Any) – Pool allocator used to free KV indices on request completion.

  • device (torch.device) – Device for empty tensors returned by match_prefix().

pool = None
device
reset()

Clear all cached state and re-initialise.

Return type:

None

match_prefix(key)

Always returns an empty match (no prefix sharing).

Parameters:

key (pymllm.mem_cache.base_prefix_cache.RadixKey)

Return type:

pymllm.mem_cache.base_prefix_cache.MatchResult

insert(key, value=None, **kwargs)

No-op: nothing is cached.

Parameters:
Return type:

pymllm.mem_cache.base_prefix_cache.InsertResult

evict(num_tokens, swa_num_tokens=0)

No-op: nothing to evict.

Parameters:
  • num_tokens (int)

  • swa_num_tokens (int)

Return type:

pymllm.mem_cache.base_prefix_cache.EvictResult

inc_lock_ref(node)

No-op: nothing to lock.

Parameters:

node (Any)

Return type:

Optional[Any]

dec_lock_ref(node, **kwargs)

No-op: nothing to unlock.

Parameters:
  • node (Any)

  • kwargs (Any)

Return type:

None