pymllm.quantization.kernels.int8_activation_triton ================================================== .. py:module:: pymllm.quantization.kernels.int8_activation_triton .. autoapi-nested-parse:: Per-token INT8 activation quantization using Triton. Ported from sglang int8_kernel.py (per_token_quant_int8). Original: sglang/srt/layers/quantization/int8_kernel.py:28-89 Functions --------- .. autoapisummary:: pymllm.quantization.kernels.int8_activation_triton.per_token_quant_int8 Module Contents --------------- .. py:function:: per_token_quant_int8(x, scale_dtype = torch.float32) Per-token dynamic INT8 quantization. :param x: Input tensor, any shape with last dim = hidden_dim. Must be contiguous. :param scale_dtype: Dtype for scale output (default float32). :returns: INT8 quantized tensor, same shape as x. scales: Per-token scales, shape = x.shape[:-1] + (1,). :rtype: x_q