pymllm.mem_cache.chunk_cache ============================ .. py:module:: pymllm.mem_cache.chunk_cache .. autoapi-nested-parse:: No-op prefix cache used when ``disable_radix_cache=True``. Every request is fully computed from scratch -- no prefix sharing, no tree structure, no eviction logic. This is the simplest possible :class:`~pymllm.mem_cache.base_prefix_cache.BasePrefixCache` implementation. Classes ------- .. autoapisummary:: pymllm.mem_cache.chunk_cache.ChunkCache Module Contents --------------- .. py:class:: ChunkCache(token_to_kv_pool_allocator = None, device = torch.device('cpu')) Bases: :py:obj:`pymllm.mem_cache.base_prefix_cache.BasePrefixCache` No-op prefix cache: no prefix sharing, no eviction. When the radix cache is disabled, this class replaces it so that the rest of the system can call the same interface without branching. :param token_to_kv_pool_allocator: Pool allocator used to free KV indices on request completion. :param device: Device for empty tensors returned by :meth:`match_prefix`. .. py:attribute:: pool :value: None .. py:attribute:: device .. py:method:: reset() Clear all cached state and re-initialise. .. py:method:: match_prefix(key) Always returns an empty match (no prefix sharing). .. py:method:: insert(key, value = None, **kwargs) No-op: nothing is cached. .. py:method:: evict(num_tokens, swa_num_tokens = 0) No-op: nothing to evict. .. py:method:: inc_lock_ref(node) No-op: nothing to lock. .. py:method:: dec_lock_ref(node, **kwargs) No-op: nothing to unlock.