pymllm.mem_cache.mamba_radix_cache
==================================

.. py:module:: pymllm.mem_cache.mamba_radix_cache

.. autoapi-nested-parse::

   Radix-tree KV cache with independent Mamba/SSM state tracking.

   Extends :class:`~pymllm.mem_cache.radix_cache.RadixCache` with dual-tracked
   state for hybrid models that combine full attention layers and SSM (Mamba /
   GDN) layers.  Each tree node stores both:

   - ``value``: KV-pool indices for full-attention layers
   - ``mamba_value``: state-pool indices for SSM layers

   The two pools have **independent reference counting and LRU eviction**:
   Mamba state can be evicted more aggressively than full KV cache.

   Reference: sglang ``MambaRadixCache``.


Attributes
----------

.. autoapisummary::

   pymllm.mem_cache.mamba_radix_cache.logger


Classes
-------

.. autoapisummary::

   pymllm.mem_cache.mamba_radix_cache.MambaTreeNode
   pymllm.mem_cache.mamba_radix_cache.LRUList
   pymllm.mem_cache.mamba_radix_cache.MambaRadixCache


Module Contents
---------------

.. py:data:: logger

.. py:class:: MambaTreeNode

   Tree node with dual KV + Mamba state tracking.

   Invariant: ``full_lock_ref >= mamba_lock_ref``.  If Mamba state is
   locked, full KV must also be locked; full KV alone can be locked
   without locking Mamba state.


   .. py:attribute:: __slots__
      :value: ('children', 'parent', 'key', 'value', 'mamba_value', 'full_lock_ref', 'mamba_lock_ref',...


   .. py:attribute:: children
      :type:  Dict[Any, MambaTreeNode]


   .. py:attribute:: parent
      :type:  Optional[MambaTreeNode]
      :value: None


   .. py:attribute:: key
      :type:  Optional[pymllm.mem_cache.base_prefix_cache.RadixKey]
      :value: None


   .. py:attribute:: value
      :type:  Optional[torch.Tensor]
      :value: None


   .. py:attribute:: mamba_value
      :type:  Optional[torch.Tensor]
      :value: None


   .. py:attribute:: full_lock_ref
      :type:  int
      :value: 0


   .. py:attribute:: mamba_lock_ref
      :type:  int
      :value: 0


   .. py:attribute:: last_access_time
      :type:  float


   .. py:attribute:: hit_count
      :type:  int
      :value: 0


   .. py:attribute:: id
      :type:  int
      :value: 1


   .. py:attribute:: prev
      :type:  Optional[MambaTreeNode]
      :value: None


   .. py:attribute:: next
      :type:  Optional[MambaTreeNode]
      :value: None


   .. py:attribute:: mamba_prev
      :type:  Optional[MambaTreeNode]
      :value: None


   .. py:attribute:: mamba_next
      :type:  Optional[MambaTreeNode]
      :value: None


   .. py:property:: evicted
      :type: bool


   .. py:property:: mamba_tombstone
      :type: bool


      Node has full KV but Mamba state was evicted.


   .. py:method:: __lt__(other)


.. py:class:: LRUList(mamba = False)

   Intrusive doubly-linked list for LRU ordering.

   Supports two modes via *mamba* flag: uses ``prev``/``next`` or
   ``mamba_prev``/``mamba_next`` pointers on :class:`MambaTreeNode`.


   .. py:attribute:: mamba
      :value: False


   .. py:attribute:: head


   .. py:attribute:: tail


   .. py:method:: __len__()


   .. py:method:: __contains__(node)


   .. py:method:: insert_mru(node)

      Insert *node* at the MRU (head) position.


   .. py:method:: remove(node)

      Remove *node* from the list.


   .. py:method:: touch_mru(node)

      Move an existing *node* to the MRU position.


   .. py:method:: touch_node_and_parents_mru(node, root)

      Move *node* and all ancestors up to *root* to MRU.

      Child is more recently used than parent.


   .. py:method:: get_lru_leaf_unlocked()

      Return the LRU leaf node with lock_ref == 0, or ``None``.


   .. py:method:: get_lru_unlocked()

      Return the LRU node with lock_ref == 0, or ``None``.


.. py:class:: MambaRadixCache(page_size = 1, token_to_kv_pool_allocator = None, mamba_pool = None, on_node_evict = None)

   Bases: :py:obj:`pymllm.mem_cache.base_prefix_cache.BasePrefixCache`


   Radix tree with independent Mamba/SSM state tracking.

   :param page_size: Number of tokens per KV-pool page.
   :param token_to_kv_pool_allocator: Pool allocator for full-attention KV indices.
   :param mamba_pool: Pool object for Mamba/SSM state.  Must support ``alloc_track_slot()``,
                      ``free_track_slot(slot)``, ``copy_states(src, dst)``.
   :param on_node_evict: Optional callback invoked with node id on eviction.


   .. py:attribute:: page_size
      :value: 1


   .. py:attribute:: pool
      :value: None


   .. py:attribute:: mamba_pool
      :value: None


   .. py:attribute:: on_node_evict
      :value: None


   .. py:attribute:: full_lru


   .. py:attribute:: mamba_lru


   .. py:method:: evictable_size()


   .. py:method:: protected_size()


   .. py:method:: mamba_evictable_size()


   .. py:method:: mamba_protected_size()


   .. py:method:: total_size()


   .. py:method:: reset()

      Clear all cached state and re-initialise.


   .. py:method:: match_prefix(key)

      Find longest cached prefix.  Also returns ``mamba_branching_seqlen``.


   .. py:method:: insert(key, value = None, *, mamba_value = None, **kwargs)

      Insert with both full KV and Mamba state values.


   .. py:method:: evict(num_tokens, swa_num_tokens = 0)

      Evict full KV and/or Mamba state tokens.

      Phase 1: Evict full KV leaves (frees both KV and Mamba state).
      Phase 2: Evict Mamba state from internal nodes (tombstone mamba).


   .. py:method:: inc_lock_ref(node)

      Lock full KV and Mamba state from *node* to root.

      Full lock propagates up to root.  Mamba lock only applies to
      the node itself (not ancestors).


   .. py:method:: dec_lock_ref(node, **kwargs)

      Unlock full KV and Mamba state.


   .. py:method:: pretty_print()

      Print the tree structure to stdout.