pymllm.orchestrator.parallel_state

Minimal parallel state for single-GPU serving.

pymllm targets single-GPU, high-concurrency inference. This module keeps the TP / DP / PP scaffolding so the rest of the codebase can query ranks and groups uniformly, but the default (and expected) case is world_size=1.

Attributes

Functions

Module Contents

pymllm.orchestrator.parallel_state.logger
pymllm.orchestrator.parallel_state.initialize_model_parallel(tensor_model_parallel_size=1, data_parallel_size=1, pipeline_model_parallel_size=1, backend='nccl')
Parameters:
  • tensor_model_parallel_size (int)

  • data_parallel_size (int)

  • pipeline_model_parallel_size (int)

  • backend (str)

Return type:

None

pymllm.orchestrator.parallel_state.get_tp_group()
Return type:

Optional[pymllm.orchestrator.group_coordinator.GroupCoordinator]

pymllm.orchestrator.parallel_state.get_dp_group()
Return type:

Optional[pymllm.orchestrator.group_coordinator.GroupCoordinator]

pymllm.orchestrator.parallel_state.get_pp_group()
Return type:

Optional[pymllm.orchestrator.group_coordinator.GroupCoordinator]

pymllm.orchestrator.parallel_state.get_tensor_model_parallel_rank()
Return type:

int

pymllm.orchestrator.parallel_state.get_tensor_model_parallel_world_size()
Return type:

int

pymllm.orchestrator.parallel_state.get_data_parallel_rank()
Return type:

int

pymllm.orchestrator.parallel_state.get_data_parallel_world_size()
Return type:

int

pymllm.orchestrator.parallel_state.get_pipeline_model_parallel_rank()
Return type:

int

pymllm.orchestrator.parallel_state.get_pipeline_model_parallel_world_size()
Return type:

int

pymllm.orchestrator.parallel_state.model_parallel_is_initialized()
Return type:

bool

pymllm.orchestrator.parallel_state.tensor_model_parallel_all_reduce(tensor)
Parameters:

tensor (torch.Tensor)

Return type:

torch.Tensor

pymllm.orchestrator.parallel_state.tensor_model_parallel_all_gather(tensor, dim=0)
Parameters:
  • tensor (torch.Tensor)

  • dim (int)

Return type:

torch.Tensor

pymllm.orchestrator.parallel_state.data_parallel_all_reduce(tensor)
Parameters:

tensor (torch.Tensor)

Return type:

torch.Tensor