pymllm.orchestrator.model_runner_process
========================================

.. py:module:: pymllm.orchestrator.model_runner_process

.. autoapi-nested-parse::

   ModelRunnerProcess -- GPU-owning component that executes model forward passes.

   Instantiated **in-process** by :class:`SchedulerProcess`
   The scheduler calls :meth:`_forward_batch` directly —
   no inter-process communication is involved.

   This component owns the GPU: it holds a :class:`ModelRunner` with model
   weights, KV-cache memory pools, and the attention backend.  It also owns
   the :class:`RadixCache` for prefix-aware KV reuse.

   RadixCache lifecycle
   --------------------
   1. **match_prefix** — called during ``_allocate_extend`` before KV allocation.
   2. **inc_lock_ref** — locks matched radix-tree nodes to prevent eviction.
   3. **insert (prefill)** — inserts prompt KV indices after prefill.
   4. **insert (completion)** — re-inserts the full sequence when a request finishes.
   5. **dec_lock_ref** — unlocks radix-tree nodes when a request is freed.
   6. **evict** — called when KV allocation fails to free stale cache entries.


Attributes
----------

.. autoapisummary::

   pymllm.orchestrator.model_runner_process.logger


Classes
-------

.. autoapisummary::

   pymllm.orchestrator.model_runner_process.ModelRunnerProcess


Module Contents
---------------

.. py:data:: logger

.. py:class:: ModelRunnerProcess(gpu_id = 0, server_config = None, model_config = None)

   GPU-owning component created in-process by SchedulerProcess.


   .. py:method:: init_model()

      Create and initialise the ModelRunner and RadixCache.

      Must run inside the subprocess (after spawn) since it does CUDA init.


   .. py:method:: shutdown()