bench_w8a8_activation_quant =========================== .. py:module:: bench_w8a8_activation_quant .. autoapi-nested-parse:: Benchmark W8A8 activation quantization implementations. Covers: torch path (current) and (future) Triton kernel. This script is reusable across phases. Usage: python pymllm/tests/bench_w8a8_activation_quant.py Functions --------- .. autoapisummary:: bench_w8a8_activation_quant.torch_per_token_quant_int8 bench_w8a8_activation_quant.bench_fn bench_w8a8_activation_quant.run_benchmarks Module Contents --------------- .. py:function:: torch_per_token_quant_int8(x) Current torch-based activation quantization. .. py:function:: bench_fn(fn, args, warmup=5, repeat=20) Returns median latency in ms. .. py:function:: run_benchmarks()