pymllm.backends.qualcomm.transformers.core.qdq

Attributes

Classes

ActivationQDQ

General activation Quantization-DeQuantization (QDQ) module.

FixedActivationQDQ

Fixed activation Quantization-DeQuantization (QDQ) module.

Module Contents

pymllm.backends.qualcomm.transformers.core.qdq.DEFAULT_EPS_8BIT = 3.921568627450981e-07
pymllm.backends.qualcomm.transformers.core.qdq.DEFAULT_EPS_16BIT = 1.5259021896696422e-09
class pymllm.backends.qualcomm.transformers.core.qdq.ActivationQDQ(bits=8, qscheme=torch.per_tensor_affine)

Bases: torch.nn.Module

General activation Quantization-DeQuantization (QDQ) module. Supports both Symmetric and Asymmetric (Affine) quantization. Uses torch.qint32 as a unified type to support various bit-widths.

bits = 8
qscheme
dtype
fake_quant
forward(x)
enable_observer()

Enable tracking of min/max values to update scale and zero_point.

disable_observer()

Freeze scale and zero_point calculation.

enable_fakequant()

Enable simulation of quantization error.

disable_fakequant()

Disable quantization simulation (act as identity).

extra_repr()
class pymllm.backends.qualcomm.transformers.core.qdq.FixedActivationQDQ(scale, zero_point, bits=8, qscheme=torch.per_tensor_affine)

Bases: torch.nn.Module

Fixed activation Quantization-DeQuantization (QDQ) module. Uses pre-determined scale and zero_point instead of dynamic observation. Supports both Symmetric and Asymmetric (Affine) quantization. Uses torch.qint32 as a unified type to support various bit-widths.

bits = 8
qscheme
dtype
fake_quant
forward(x)
enable_observer()

No-op: FixedActivationQDQ doesn’t use observer.

disable_observer()

No-op: FixedActivationQDQ doesn’t use observer.

enable_fakequant()

Enable simulation of quantization error.

disable_fakequant()

Disable quantization simulation (act as identity).

property scale

Get the fixed scale value.

property zero_point

Get the fixed zero_point value.

extra_repr()