pymllm.backends.qualcomm.transformers.core.qdq

Classes

ActivationQDQ

General activation Quantization-DeQuantization (QDQ) module.

Module Contents

class pymllm.backends.qualcomm.transformers.core.qdq.ActivationQDQ(bits=8, qscheme=torch.per_tensor_affine)

Bases: torch.nn.Module

General activation Quantization-DeQuantization (QDQ) module. Supports both Symmetric and Asymmetric (Affine) quantization. Uses torch.qint32 as a unified type to support various bit-widths.

bits = 8
qscheme
dtype
fake_quant
forward(x)
enable_observer()

Enable tracking of min/max values to update scale and zero_point.

disable_observer()

Freeze scale and zero_point calculation.

enable_fakequant()

Enable simulation of quantization error.

disable_fakequant()

Disable quantization simulation (act as identity).

extra_repr()