bench_w8a8_activation_quant
===========================

.. py:module:: bench_w8a8_activation_quant

.. autoapi-nested-parse::

   Benchmark W8A8 activation quantization implementations.

   Covers: torch path (current) and (future) Triton kernel.
   This script is reusable across phases.

   Usage:
       python pymllm/tests/bench_w8a8_activation_quant.py


Functions
---------

.. autoapisummary::

   bench_w8a8_activation_quant.torch_per_token_quant_int8
   bench_w8a8_activation_quant.bench_fn
   bench_w8a8_activation_quant.run_benchmarks


Module Contents
---------------

.. py:function:: torch_per_token_quant_int8(x)

   Current torch-based activation quantization.


.. py:function:: bench_fn(fn, args, warmup=5, repeat=20)

   Returns median latency in ms.


.. py:function:: run_benchmarks()