bench_w8a8_activation_quant

Benchmark W8A8 activation quantization implementations.

Covers: torch path (current) and (future) Triton kernel. This script is reusable across phases.

Usage:

python pymllm/tests/bench_w8a8_activation_quant.py

Functions

torch_per_token_quant_int8(x)

Current torch-based activation quantization.

bench_fn(fn, args[, warmup, repeat])

Returns median latency in ms.

run_benchmarks()

Module Contents

bench_w8a8_activation_quant.torch_per_token_quant_int8(x)

Current torch-based activation quantization.

Parameters:

x (torch.Tensor)

bench_w8a8_activation_quant.bench_fn(fn, args, warmup=5, repeat=20)

Returns median latency in ms.

Return type:

float

bench_w8a8_activation_quant.run_benchmarks()