pymllm.orchestrator.tokenizer_process ===================================== .. py:module:: pymllm.orchestrator.tokenizer_process .. autoapi-nested-parse:: TokenizerProcess -- subprocess that tokenizes incoming raw requests. Receives raw requests from RequestResponseProcess via ZMQ, tokenizes them, and forwards the tokenized payloads to the SchedulerProcess. Supports two transport modes (controlled by ``enable_shared_queue`` and ``tensor_transport_mode`` in the tokenizer config): 1. **Legacy ZMQ path** (``enable_shared_queue=False``): Tokenized objects are sent directly via ``ZMQ send_pyobj`` (pickle). This is simple but slow for large multimodal tensors. 2. **Shared queue fast path** (``enable_shared_queue=True``): Metadata is written to POSIX shared memory and the queue carries a lightweight ``(rid, shm_name, mm_inputs)`` tuple. The GPU tensors inside ``mm_inputs`` are transported differently depending on ``tensor_transport_mode``: * ``"default"`` – GPU tensors are moved to CPU first (GPU→CPU copy), then placed in POSIX shared memory. * ``"cuda_ipc"`` – GPU tensors stay on GPU; they are wrapped in a :class:`TransportProxyTensor` whose pickle uses CUDA IPC handles. Simple but may leak GPU memory. * ``"cuda_ipc_pool"`` – GPU tensors are copied into a pre-allocated :class:`MmItemMemoryPool` workspace and shared via pool-chunk IPC handles. Chunks are recycled; no GPU memory is leaked. Attributes ---------- .. autoapisummary:: pymllm.orchestrator.tokenizer_process.logger Classes ------- .. autoapisummary:: pymllm.orchestrator.tokenizer_process.TokenizerProcess Functions --------- .. autoapisummary:: pymllm.orchestrator.tokenizer_process.run_tokenizer_process Module Contents --------------- .. py:data:: logger .. py:class:: TokenizerProcess(recv_from_rr_addr, send_to_scheduler_addr, tokenizer_cfg, shared_queue = None) Runs inside a subprocess spawned by ``torch.multiprocessing``. .. py:method:: init_sockets() .. py:method:: event_loop() Infinite loop: recv raw request -> tokenize -> send to scheduler. .. py:method:: shutdown() .. py:function:: run_tokenizer_process(recv_from_rr_addr, send_to_scheduler_addr, pipe_writer, tokenizer_cfg, shared_queue = None) Entry point for ``torch.multiprocessing.Process(target=...)``.