pymllm.backends.qualcomm.transformers.core.qdq¶

Attributes¶

`DEFAULT_EPS_8BIT`
`DEFAULT_EPS_16BIT`

Classes¶

`ActivationQDQ`	General activation Quantization-DeQuantization (QDQ) module.
`FixedActivationQDQ`	Fixed activation Quantization-DeQuantization (QDQ) module.

Module Contents¶

pymllm.backends.qualcomm.transformers.core.qdq.DEFAULT_EPS_8BIT = 3.921568627450981e-07¶

pymllm.backends.qualcomm.transformers.core.qdq.DEFAULT_EPS_16BIT = 1.5259021896696422e-09¶

class pymllm.backends.qualcomm.transformers.core.qdq.ActivationQDQ(bits=8, qscheme=torch.per_tensor_affine)¶

Bases: torch.nn.Module

General activation Quantization-DeQuantization (QDQ) module. Supports both Symmetric and Asymmetric (Affine) quantization. Uses torch.qint32 as a unified type to support various bit-widths.

bits = 8¶

qscheme¶

dtype¶

fake_quant¶

forward(x)¶

enable_observer()¶: Enable tracking of min/max values to update scale and zero_point.

disable_observer()¶: Freeze scale and zero_point calculation.

enable_fakequant()¶: Enable simulation of quantization error.

disable_fakequant()¶: Disable quantization simulation (act as identity).

extra_repr()¶

class pymllm.backends.qualcomm.transformers.core.qdq.FixedActivationQDQ(scale, zero_point, bits=8, qscheme=torch.per_tensor_affine)¶

Bases: torch.nn.Module

Fixed activation Quantization-DeQuantization (QDQ) module. Uses pre-determined scale and zero_point instead of dynamic observation. Supports both Symmetric and Asymmetric (Affine) quantization. Uses torch.qint32 as a unified type to support various bit-widths.

bits = 8¶

qscheme¶

dtype¶

fake_quant¶

forward(x)¶

enable_observer()¶: No-op: FixedActivationQDQ doesn’t use observer.

disable_observer()¶: No-op: FixedActivationQDQ doesn’t use observer.

enable_fakequant()¶: Enable simulation of quantization error.

disable_fakequant()¶: Disable quantization simulation (act as identity).

property scale¶: Get the fixed scale value.

property zero_point¶: Get the fixed zero_point value.

extra_repr()¶