pymllm.orchestrator.tokenizer_process
=====================================

.. py:module:: pymllm.orchestrator.tokenizer_process

.. autoapi-nested-parse::

   TokenizerProcess -- subprocess that tokenizes incoming raw requests.

   Receives raw requests from RequestResponseProcess via ZMQ, tokenizes them,
   and forwards the tokenized payloads to the SchedulerProcess.

   Supports two transport modes (controlled by ``enable_shared_queue`` and
   ``tensor_transport_mode`` in the tokenizer config):

   1. **Legacy ZMQ path** (``enable_shared_queue=False``):
      Tokenized objects are sent directly via ``ZMQ send_pyobj`` (pickle). This
      is simple but slow for large multimodal tensors.

   2. **Shared queue fast path** (``enable_shared_queue=True``):
      Metadata is written to POSIX shared memory and the queue carries a
      lightweight ``(rid, shm_name, mm_inputs)`` tuple. The GPU tensors inside
      ``mm_inputs`` are transported differently depending on ``tensor_transport_mode``:

      * ``"default"``       – GPU tensors are moved to CPU first (GPU→CPU copy),
        then placed in POSIX shared memory.
      * ``"cuda_ipc"``      – GPU tensors stay on GPU; they are wrapped in a
        :class:`TransportProxyTensor` whose pickle uses CUDA IPC handles.
        Simple but may leak GPU memory.
      * ``"cuda_ipc_pool"`` – GPU tensors are copied into a pre-allocated
        :class:`MmItemMemoryPool` workspace and shared via pool-chunk IPC
        handles. Chunks are recycled; no GPU memory is leaked.


Attributes
----------

.. autoapisummary::

   pymllm.orchestrator.tokenizer_process.logger


Classes
-------

.. autoapisummary::

   pymllm.orchestrator.tokenizer_process.TokenizerProcess


Functions
---------

.. autoapisummary::

   pymllm.orchestrator.tokenizer_process.run_tokenizer_process


Module Contents
---------------

.. py:data:: logger

.. py:class:: TokenizerProcess(recv_from_rr_addr, send_to_scheduler_addr, tokenizer_cfg, shared_queue = None)

   Runs inside a subprocess spawned by ``torch.multiprocessing``.


   .. py:method:: init_sockets()


   .. py:method:: event_loop()

      Infinite loop: recv raw request -> tokenize -> send to scheduler.


   .. py:method:: shutdown()


.. py:function:: run_tokenizer_process(recv_from_rr_addr, send_to_scheduler_addr, pipe_writer, tokenizer_cfg, shared_queue = None)

   Entry point for ``torch.multiprocessing.Process(target=...)``.