pymllm.mem_cache.mamba_radix_cache ================================== .. py:module:: pymllm.mem_cache.mamba_radix_cache .. autoapi-nested-parse:: Radix-tree KV cache with independent Mamba/SSM state tracking. Extends :class:`~pymllm.mem_cache.radix_cache.RadixCache` with dual-tracked state for hybrid models that combine full attention layers and SSM (Mamba / GDN) layers. Each tree node stores both: - ``value``: KV-pool indices for full-attention layers - ``mamba_value``: state-pool indices for SSM layers The two pools have **independent reference counting and LRU eviction**: Mamba state can be evicted more aggressively than full KV cache. Reference: sglang ``MambaRadixCache``. Attributes ---------- .. autoapisummary:: pymllm.mem_cache.mamba_radix_cache.logger Classes ------- .. autoapisummary:: pymllm.mem_cache.mamba_radix_cache.MambaTreeNode pymllm.mem_cache.mamba_radix_cache.LRUList pymllm.mem_cache.mamba_radix_cache.MambaRadixCache Module Contents --------------- .. py:data:: logger .. py:class:: MambaTreeNode Tree node with dual KV + Mamba state tracking. Invariant: ``full_lock_ref >= mamba_lock_ref``. If Mamba state is locked, full KV must also be locked; full KV alone can be locked without locking Mamba state. .. py:attribute:: __slots__ :value: ('children', 'parent', 'key', 'value', 'mamba_value', 'full_lock_ref', 'mamba_lock_ref',... .. py:attribute:: children :type: Dict[Any, MambaTreeNode] .. py:attribute:: parent :type: Optional[MambaTreeNode] :value: None .. py:attribute:: key :type: Optional[pymllm.mem_cache.base_prefix_cache.RadixKey] :value: None .. py:attribute:: value :type: Optional[torch.Tensor] :value: None .. py:attribute:: mamba_value :type: Optional[torch.Tensor] :value: None .. py:attribute:: full_lock_ref :type: int :value: 0 .. py:attribute:: mamba_lock_ref :type: int :value: 0 .. py:attribute:: last_access_time :type: float .. py:attribute:: hit_count :type: int :value: 0 .. py:attribute:: id :type: int :value: 1 .. py:attribute:: prev :type: Optional[MambaTreeNode] :value: None .. py:attribute:: next :type: Optional[MambaTreeNode] :value: None .. py:attribute:: mamba_prev :type: Optional[MambaTreeNode] :value: None .. py:attribute:: mamba_next :type: Optional[MambaTreeNode] :value: None .. py:property:: evicted :type: bool .. py:property:: mamba_tombstone :type: bool Node has full KV but Mamba state was evicted. .. py:method:: __lt__(other) .. py:class:: LRUList(mamba = False) Intrusive doubly-linked list for LRU ordering. Supports two modes via *mamba* flag: uses ``prev``/``next`` or ``mamba_prev``/``mamba_next`` pointers on :class:`MambaTreeNode`. .. py:attribute:: mamba :value: False .. py:attribute:: head .. py:attribute:: tail .. py:method:: __len__() .. py:method:: __contains__(node) .. py:method:: insert_mru(node) Insert *node* at the MRU (head) position. .. py:method:: remove(node) Remove *node* from the list. .. py:method:: touch_mru(node) Move an existing *node* to the MRU position. .. py:method:: touch_node_and_parents_mru(node, root) Move *node* and all ancestors up to *root* to MRU. Child is more recently used than parent. .. py:method:: get_lru_leaf_unlocked() Return the LRU leaf node with lock_ref == 0, or ``None``. .. py:method:: get_lru_unlocked() Return the LRU node with lock_ref == 0, or ``None``. .. py:class:: MambaRadixCache(page_size = 1, token_to_kv_pool_allocator = None, mamba_pool = None, on_node_evict = None) Bases: :py:obj:`pymllm.mem_cache.base_prefix_cache.BasePrefixCache` Radix tree with independent Mamba/SSM state tracking. :param page_size: Number of tokens per KV-pool page. :param token_to_kv_pool_allocator: Pool allocator for full-attention KV indices. :param mamba_pool: Pool object for Mamba/SSM state. Must support ``alloc_track_slot()``, ``free_track_slot(slot)``, ``copy_states(src, dst)``. :param on_node_evict: Optional callback invoked with node id on eviction. .. py:attribute:: page_size :value: 1 .. py:attribute:: pool :value: None .. py:attribute:: mamba_pool :value: None .. py:attribute:: on_node_evict :value: None .. py:attribute:: full_lru .. py:attribute:: mamba_lru .. py:method:: evictable_size() .. py:method:: protected_size() .. py:method:: mamba_evictable_size() .. py:method:: mamba_protected_size() .. py:method:: total_size() .. py:method:: reset() Clear all cached state and re-initialise. .. py:method:: match_prefix(key) Find longest cached prefix. Also returns ``mamba_branching_seqlen``. .. py:method:: insert(key, value = None, *, mamba_value = None, **kwargs) Insert with both full KV and Mamba state values. .. py:method:: evict(num_tokens, swa_num_tokens = 0) Evict full KV and/or Mamba state tokens. Phase 1: Evict full KV leaves (frees both KV and Mamba state). Phase 2: Evict Mamba state from internal nodes (tombstone mamba). .. py:method:: inc_lock_ref(node) Lock full KV and Mamba state from *node* to root. Full lock propagates up to root. Mamba lock only applies to the node itself (not ancestors). .. py:method:: dec_lock_ref(node, **kwargs) Unlock full KV and Mamba state. .. py:method:: pretty_print() Print the tree structure to stdout.