pymllm.orchestrator.parallel_state¶

Minimal parallel state for single-GPU serving.

pymllm targets single-GPU, high-concurrency inference. This module keeps the TP / DP / PP scaffolding so the rest of the codebase can query ranks and groups uniformly, but the default (and expected) case is world_size=1.

Attributes¶

logger

Functions¶

`initialize_model_parallel`([...])
`get_tp_group`()
`get_dp_group`()
`get_pp_group`()
`get_tensor_model_parallel_rank`()
`get_tensor_model_parallel_world_size`()
`get_data_parallel_rank`()
`get_data_parallel_world_size`()
`get_pipeline_model_parallel_rank`()
`get_pipeline_model_parallel_world_size`()
`model_parallel_is_initialized`()
`tensor_model_parallel_all_reduce`(tensor)
`tensor_model_parallel_all_gather`(tensor[, dim])
`data_parallel_all_reduce`(tensor)

Module Contents¶

pymllm.orchestrator.parallel_state.logger¶

pymllm.orchestrator.parallel_state.initialize_model_parallel(tensor_model_parallel_size=1, data_parallel_size=1, pipeline_model_parallel_size=1, backend='nccl')¶

Parameters:

tensor_model_parallel_size (int)
data_parallel_size (int)
pipeline_model_parallel_size (int)
backend (str)

Return type:

None

pymllm.orchestrator.parallel_state.get_tp_group()¶

Return type:: Optional[pymllm.orchestrator.group_coordinator.GroupCoordinator]

pymllm.orchestrator.parallel_state.get_dp_group()¶

Return type:: Optional[pymllm.orchestrator.group_coordinator.GroupCoordinator]

pymllm.orchestrator.parallel_state.get_pp_group()¶

Return type:: Optional[pymllm.orchestrator.group_coordinator.GroupCoordinator]

pymllm.orchestrator.parallel_state.get_tensor_model_parallel_rank()¶

Return type:: int

pymllm.orchestrator.parallel_state.get_tensor_model_parallel_world_size()¶

Return type:: int

pymllm.orchestrator.parallel_state.get_data_parallel_rank()¶

Return type:: int

pymllm.orchestrator.parallel_state.get_data_parallel_world_size()¶

Return type:: int

pymllm.orchestrator.parallel_state.get_pipeline_model_parallel_rank()¶

Return type:: int

pymllm.orchestrator.parallel_state.get_pipeline_model_parallel_world_size()¶

Return type:: int

pymllm.orchestrator.parallel_state.model_parallel_is_initialized()¶

Return type:: bool

pymllm.orchestrator.parallel_state.tensor_model_parallel_all_reduce(tensor)¶

Parameters:: tensor (torch.Tensor)
Return type:: torch.Tensor

pymllm.orchestrator.parallel_state.tensor_model_parallel_all_gather(tensor, dim=0)¶

Parameters:

tensor (torch.Tensor)
dim (int)

Return type:

torch.Tensor

pymllm.orchestrator.parallel_state.data_parallel_all_reduce(tensor)¶

Parameters:: tensor (torch.Tensor)
Return type:: torch.Tensor