pymllm.orchestrator.parallel_state¶
Minimal parallel state for single-GPU serving.
pymllm targets single-GPU, high-concurrency inference. This module keeps the TP / DP / PP scaffolding so the rest of the codebase can query ranks and groups uniformly, but the default (and expected) case is world_size=1.
Attributes¶
Functions¶
|
|
|
|
|
|
|
Module Contents¶
- pymllm.orchestrator.parallel_state.logger¶
- pymllm.orchestrator.parallel_state.initialize_model_parallel(tensor_model_parallel_size=1, data_parallel_size=1, pipeline_model_parallel_size=1, backend='nccl')¶
- Parameters:
tensor_model_parallel_size (int)
data_parallel_size (int)
pipeline_model_parallel_size (int)
backend (str)
- Return type:
None
- pymllm.orchestrator.parallel_state.get_tp_group()¶
- Return type:
Optional[pymllm.orchestrator.group_coordinator.GroupCoordinator]
- pymllm.orchestrator.parallel_state.get_dp_group()¶
- Return type:
Optional[pymllm.orchestrator.group_coordinator.GroupCoordinator]
- pymllm.orchestrator.parallel_state.get_pp_group()¶
- Return type:
Optional[pymllm.orchestrator.group_coordinator.GroupCoordinator]
- pymllm.orchestrator.parallel_state.get_tensor_model_parallel_rank()¶
- Return type:
int
- pymllm.orchestrator.parallel_state.get_tensor_model_parallel_world_size()¶
- Return type:
int
- pymllm.orchestrator.parallel_state.get_data_parallel_rank()¶
- Return type:
int
- pymllm.orchestrator.parallel_state.get_data_parallel_world_size()¶
- Return type:
int
- pymllm.orchestrator.parallel_state.get_pipeline_model_parallel_rank()¶
- Return type:
int
- pymllm.orchestrator.parallel_state.get_pipeline_model_parallel_world_size()¶
- Return type:
int
- pymllm.orchestrator.parallel_state.model_parallel_is_initialized()¶
- Return type:
bool
- pymllm.orchestrator.parallel_state.tensor_model_parallel_all_reduce(tensor)¶
- Parameters:
tensor (torch.Tensor)
- Return type:
torch.Tensor
- pymllm.orchestrator.parallel_state.tensor_model_parallel_all_gather(tensor, dim=0)¶
- Parameters:
tensor (torch.Tensor)
dim (int)
- Return type:
torch.Tensor
- pymllm.orchestrator.parallel_state.data_parallel_all_reduce(tensor)¶
- Parameters:
tensor (torch.Tensor)
- Return type:
torch.Tensor