bench_w8a8_activation_quant¶
Benchmark W8A8 activation quantization implementations.
Covers: torch path (current) and (future) Triton kernel. This script is reusable across phases.
- Usage:
python pymllm/tests/bench_w8a8_activation_quant.py
Functions¶
Current torch-based activation quantization. |
|
|
Returns median latency in ms. |
Module Contents¶
- bench_w8a8_activation_quant.torch_per_token_quant_int8(x)¶
Current torch-based activation quantization.
- Parameters:
x (torch.Tensor)
- bench_w8a8_activation_quant.bench_fn(fn, args, warmup=5, repeat=20)¶
Returns median latency in ms.
- Return type:
float
- bench_w8a8_activation_quant.run_benchmarks()¶