bench_w8a8_activation_quant¶

Benchmark W8A8 activation quantization implementations.

Covers: torch path (current) and (future) Triton kernel. This script is reusable across phases.

Functions¶

`torch_per_token_quant_int8`(x)	Current torch-based activation quantization.
`bench_fn`(fn, args[, warmup, repeat])	Returns median latency in ms.
`run_benchmarks`()

bench_w8a8_activation_quant.torch_per_token_quant_int8(x)¶

Current torch-based activation quantization.

bench_w8a8_activation_quant.bench_fn(fn, args, warmup=5, repeat=20)¶

Returns median latency in ms.