pymllm.bench_one_batch¶
SGLang-style one-batch benchmark for pymllm.
This module intentionally bypasses the HTTP server, tokenizer workers,
scheduler, and detokenizer. It drives pymllm.executor.ModelRunner
directly to measure one static prefill followed by token-by-token decode.
Attributes¶
Classes¶
Functions¶
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Module Contents¶
- pymllm.bench_one_batch.logger¶
- class pymllm.bench_one_batch.BenchArgs¶
- run_name: str = 'default'¶
- batch_size: list[int] = [1]¶
- input_len: list[int] = [256, 512, 1024]¶
- output_len: list[int] = [128]¶
- result_filename: pathlib.Path¶
- log_decode_step: int = 0¶
- seed: int = 42¶
- profile: bool = False¶
- profile_record_shapes: bool = False¶
- profile_activities: list[str] = ['CPU', 'GPU']¶
- profile_stage: str = 'all'¶
- profile_filename_prefix: str = 'pymllm_profile'¶
- profile_start_step: int | None = None¶
- profile_steps: int = 1¶
- skip_warmup: bool = False¶
- class pymllm.bench_one_batch.DecodeState¶
- req_pool_indices: torch.Tensor¶
- seq_lens: torch.Tensor¶
- mrope_position_deltas: torch.Tensor | None = None¶
- pymllm.bench_one_batch.add_bench_args(parser)¶
- Parameters:
parser (argparse.ArgumentParser)
- Return type:
argparse.ArgumentParser
- pymllm.bench_one_batch.make_parser()¶
- Return type:
argparse.ArgumentParser
- pymllm.bench_one_batch.parse_args(argv=None)¶
- Parameters:
argv (Optional[Sequence[str]])
- Return type:
- pymllm.bench_one_batch.generate_settings(args)¶
- Parameters:
args (BenchArgs)
- Return type:
list[BenchSetting]
- pymllm.bench_one_batch.make_synthetic_input_ids(*, batch_size, input_len, vocab_size, seed, device)¶
- Parameters:
batch_size (int)
input_len (int)
vocab_size (int)
seed (int)
device (str | torch.device)
- Return type:
torch.Tensor
- pymllm.bench_one_batch.summarize_latencies(*, setting, prefill_latency, decode_latencies, run_name, device, dtype, cuda_graph, extra=None)¶
- Parameters:
setting (BenchSetting)
prefill_latency (float)
decode_latencies (Sequence[float])
run_name (str)
device (str)
dtype (str)
cuda_graph (bool)
extra (Optional[dict[str, Any]])
- Return type:
dict[str, Any]
- pymllm.bench_one_batch.make_profile_trace_path(*, output_dir, prefix, run_name, setting, stage, step=None)¶
- Parameters:
output_dir (pathlib.Path)
prefix (str)
run_name (str)
setting (BenchSetting)
stage (str)
step (Optional[int])
- Return type:
pathlib.Path
- class pymllm.bench_one_batch.PymllmBenchRunner(runner)¶
- Parameters:
- runner¶
- device¶
- classmethod create(cfg)¶
- Parameters:
- Return type:
- clear()¶
- Return type:
None
- extend(input_ids)¶
- Parameters:
input_ids (torch.Tensor)
- Return type:
tuple[torch.Tensor, DecodeState]
- decode(input_ids, state)¶
- Parameters:
input_ids (torch.Tensor)
state (DecodeState)
- Return type:
tuple[torch.Tensor, DecodeState]
- shutdown()¶
- Return type:
None
- pymllm.bench_one_batch.run_single_setting(*, bench_runner, args, setting, seed, record_result)¶
- Parameters:
bench_runner (PymllmBenchRunner)
args (BenchArgs)
setting (BenchSetting)
seed (int)
record_result (bool)
- Return type:
Optional[dict[str, Any]]
- pymllm.bench_one_batch.run_benchmark(cfg, args)¶
- Parameters:
args (BenchArgs)
- Return type:
list[dict[str, Any]]
- pymllm.bench_one_batch.main(argv=None)¶
- Parameters:
argv (Optional[Sequence[str]])
- Return type:
None