pymllm.mem_cache.radix_cache
============================

.. py:module:: pymllm.mem_cache.radix_cache

.. autoapi-nested-parse::

   Radix-tree KV cache with SWA and multimodal support.

   Supports:
       - Multi-batch serving on a single GPU
       - Sliding Window Attention (SWA) via tombstone mechanism
       - Multimodal namespace isolation via ``extra_key``
       - SHA256 position-aware hashing
       - Page-aligned operations (page_size >= 1)
       - LRU leaf eviction


Attributes
----------

.. autoapisummary::

   pymllm.mem_cache.radix_cache.logger


Classes
-------

.. autoapisummary::

   pymllm.mem_cache.radix_cache.TreeNode
   pymllm.mem_cache.radix_cache.RadixCache


Module Contents
---------------

.. py:data:: logger

.. py:class:: TreeNode

   A single node in the radix tree.

   ``value`` holds a 1-D ``int64`` tensor of KV-pool indices (one per token
   in ``key``).  When the node has been evicted, ``value`` is ``None``.


   .. py:attribute:: __slots__
      :value: ('children', 'parent', 'key', 'value', 'lock_ref', 'swa_lock_ref', 'swa_tombstone',...


   .. py:attribute:: children
      :type:  Dict[Any, TreeNode]


   .. py:attribute:: parent
      :type:  Optional[TreeNode]
      :value: None


   .. py:attribute:: key
      :type:  Optional[pymllm.mem_cache.base_prefix_cache.RadixKey]
      :value: None


   .. py:attribute:: value
      :type:  Optional[torch.Tensor]
      :value: None


   .. py:attribute:: lock_ref
      :type:  int
      :value: 0


   .. py:attribute:: swa_lock_ref
      :type:  int
      :value: 0


   .. py:attribute:: swa_tombstone
      :type:  bool
      :value: False


   .. py:attribute:: swa_boundary_id
      :type:  Optional[int]
      :value: None


   .. py:attribute:: last_access_time
      :type:  float


   .. py:attribute:: hit_count
      :type:  int
      :value: 0


   .. py:attribute:: hash_values
      :type:  Optional[List[str]]
      :value: None


   .. py:attribute:: id
      :type:  int
      :value: 1


   .. py:property:: evicted
      :type: bool


   .. py:method:: __lt__(other)


.. py:class:: RadixCache(page_size = 1, sliding_window_size = None, token_to_kv_pool_allocator = None, on_node_evict = None)

   Bases: :py:obj:`pymllm.mem_cache.base_prefix_cache.BasePrefixCache`


   Radix tree for KV-cache prefix sharing.

   :param page_size: Number of tokens per KV-pool page.  Keys and values are aligned to
                     this granularity.
   :param sliding_window_size: If set, enables SWA mode.  The cache tracks which nodes have had
                               their SWA KV freed (tombstoned) and constrains prefix matching
                               so that the sliding-window invariant is maintained.
   :param token_to_kv_pool_allocator: Optional pool allocator with ``free(indices)`` (and ``free_swa`` for
                                      SWA mode).  When *None*, index tensors are simply discarded.
   :param on_node_evict: Optional callback invoked with the node id when a node is evicted.


   .. py:attribute:: page_size
      :value: 1


   .. py:attribute:: sliding_window_size
      :value: None


   .. py:attribute:: pool
      :value: None


   .. py:attribute:: on_node_evict
      :value: None


   .. py:property:: supports_swa
      :type: bool


   .. py:method:: evictable_size()


   .. py:method:: swa_evictable_size()


   .. py:method:: protected_size()


   .. py:method:: swa_protected_size()


   .. py:method:: total_size()

      Total number of cached tokens (including tombstoned).


   .. py:method:: reset()

      Clear all cached state and re-initialise the root node.


   .. py:method:: match_prefix(key)

      Find the longest cached prefix of *key*.

      For SWA mode the match is further constrained: the path from the
      returned ``last_node`` to root must have at least
      ``sliding_window_size`` non-tombstone tokens (or be entirely
      tombstone-free back to root).

      Accessing a prefix refreshes LRU timestamps along the matched path.


   .. py:method:: insert(key, value = None, *, prev_prefix_len = 0, swa_evicted_seqlen = 0, **kwargs)

      Insert *key*/*value* into the tree.

      Returns how many leading tokens were already present (the prefix
      length).  The caller is responsible for freeing duplicate KV indices
      in the range ``[cache_protected_len, prefix_len)``.

      :param prev_prefix_len: (SWA mode) tokens before this offset are already protected and
                              should not have their values overwritten.
      :param swa_evicted_seqlen: (SWA mode) the sequence length up to which SWA KV has been
                                 previously evicted.  Used to decide whether a tombstoned node can
                                 be un-tombstoned with the incoming value.


   .. py:method:: evict(num_tokens, swa_num_tokens = 0)

      Evict up to *num_tokens* (full) and *swa_num_tokens* (SWA) tokens.

      Full eviction removes leaf nodes entirely; SWA eviction tombstones
      internal nodes (freeing SWA KV but retaining full-attn KV).


   .. py:method:: inc_lock_ref(node)

      Lock nodes from *node* up to root (prevents eviction).

      Returns ``swa_boundary_id`` that must be passed back to
      :meth:`dec_lock_ref`.  In non-SWA mode, returns ``None``.


   .. py:method:: dec_lock_ref(node, swa_boundary_id = None, **kwargs)

      Unlock nodes from *node* up to root.


   .. py:method:: compute_node_hash(node)

      Compute position-aware SHA-256 hashes for *node* (one per page).

      Lazily computed and cached on ``node.hash_values``.


   .. py:method:: pretty_print()

      Print the tree structure to stdout.