pymllm.orchestrator.model_runner_process ======================================== .. py:module:: pymllm.orchestrator.model_runner_process .. autoapi-nested-parse:: ModelRunnerProcess -- GPU-owning component that executes model forward passes. Instantiated **in-process** by :class:`SchedulerProcess` The scheduler calls :meth:`_forward_batch` directly — no inter-process communication is involved. This component owns the GPU: it holds a :class:`ModelRunner` with model weights, KV-cache memory pools, and the attention backend. It also owns the :class:`RadixCache` for prefix-aware KV reuse. RadixCache lifecycle -------------------- 1. **match_prefix** — called during ``_allocate_extend`` before KV allocation. 2. **inc_lock_ref** — locks matched radix-tree nodes to prevent eviction. 3. **insert (prefill)** — inserts prompt KV indices after prefill. 4. **insert (completion)** — re-inserts the full sequence when a request finishes. 5. **dec_lock_ref** — unlocks radix-tree nodes when a request is freed. 6. **evict** — called when KV allocation fails to free stale cache entries. Attributes ---------- .. autoapisummary:: pymllm.orchestrator.model_runner_process.logger Classes ------- .. autoapisummary:: pymllm.orchestrator.model_runner_process.ModelRunnerProcess Module Contents --------------- .. py:data:: logger .. py:class:: ModelRunnerProcess(gpu_id = 0, server_config = None, model_config = None) GPU-owning component created in-process by SchedulerProcess. .. py:method:: init_model() Create and initialise the ModelRunner and RadixCache. Must run inside the subprocess (after spawn) since it does CUDA init. .. py:method:: shutdown()