pymllm.mem_cache.base_prefix_cache ================================== .. py:module:: pymllm.mem_cache.base_prefix_cache .. autoapi-nested-parse:: Abstract base class and shared data types for prefix cache implementations. All concrete caches (:class:`RadixCache`, :class:`ChunkCache`, :class:`MambaRadixCache`) inherit from :class:`BasePrefixCache` and share the data classes defined here. Classes ------- .. autoapisummary:: pymllm.mem_cache.base_prefix_cache.RadixKey pymllm.mem_cache.base_prefix_cache.MatchResult pymllm.mem_cache.base_prefix_cache.InsertResult pymllm.mem_cache.base_prefix_cache.EvictResult pymllm.mem_cache.base_prefix_cache.BasePrefixCache Functions --------- .. autoapisummary:: pymllm.mem_cache.base_prefix_cache.hash_token_ids pymllm.mem_cache.base_prefix_cache.hash_to_int64 pymllm.mem_cache.base_prefix_cache.hash_bytes Module Contents --------------- .. py:function:: hash_token_ids(token_ids, prior_hash = None) SHA-256 hash of a token-id page with optional chain-hash. Each token is encoded as a 4-byte little-endian unsigned integer; tuples (bigram / EAGLE) hash each element in order. When *prior_hash* is supplied the digest is seeded with the raw bytes of the previous hash, making the result position-aware. .. py:function:: hash_to_int64(hex_str) Convert a hex digest to a signed 64-bit integer (first 16 hex chars). .. py:function:: hash_bytes(data) SHA-256 -> unsigned 64-bit int. Useful for multimodal embedding keys. .. py:class:: RadixKey(token_ids, extra_key = None) Compound lookup key: token-id sequence + optional namespace tag. ``extra_key`` isolates independent namespaces so that sequences with identical leading tokens but different adapters / LoRA ids / multimodal context hashes never share prefix nodes. .. py:attribute:: __slots__ :value: ('token_ids', 'extra_key') .. py:attribute:: token_ids .. py:attribute:: extra_key :value: None .. py:method:: __len__() .. py:method:: __iter__() .. py:method:: __getitem__(idx) .. py:method:: __repr__() .. py:class:: MatchResult Returned by :meth:`BasePrefixCache.match_prefix`. .. py:attribute:: indices :type: torch.Tensor .. py:attribute:: last_node :type: Any :value: None .. py:attribute:: prefix_len :type: int :value: 0 .. py:attribute:: mamba_branching_seqlen :type: Optional[int] :value: None .. py:class:: InsertResult Returned by :meth:`BasePrefixCache.insert`. .. py:attribute:: prefix_len :type: int :value: 0 .. py:attribute:: last_node :type: Any :value: None .. py:attribute:: mamba_exist :type: bool :value: False .. py:class:: EvictResult Returned by :meth:`BasePrefixCache.evict`. .. py:attribute:: full_evicted :type: int :value: 0 .. py:attribute:: swa_evicted :type: int :value: 0 .. py:attribute:: mamba_evicted :type: int :value: 0 .. py:class:: BasePrefixCache Bases: :py:obj:`abc.ABC` Abstract interface for all prefix cache implementations. Concrete implementations: * :class:`~pymllm.mem_cache.radix_cache.RadixCache` -- radix-tree with SWA tombstone support * :class:`~pymllm.mem_cache.chunk_cache.ChunkCache` -- no-op fallback (``disable_radix_cache=True``) * :class:`~pymllm.mem_cache.mamba_radix_cache.MambaRadixCache` -- radix-tree with independent Mamba/SSM state tracking .. py:method:: reset() :abstractmethod: Clear all cached state and re-initialise. .. py:method:: match_prefix(key) :abstractmethod: Find the longest cached prefix of *key*. .. py:method:: insert(key, value = None, **kwargs) :abstractmethod: Insert *key*/*value* into the cache. .. py:method:: evict(num_tokens, swa_num_tokens = 0) :abstractmethod: Evict tokens to free memory. .. py:method:: inc_lock_ref(node) :abstractmethod: Lock *node* (and ancestors) to prevent eviction. Returns an opaque token (e.g. ``swa_boundary_id``) that must be passed back to :meth:`dec_lock_ref`. .. py:method:: dec_lock_ref(node, **kwargs) :abstractmethod: Unlock *node* (and ancestors). .. py:method:: evictable_size() .. py:method:: swa_evictable_size() .. py:method:: protected_size() .. py:method:: swa_protected_size() .. py:method:: total_size()