pymllm.backends.qualcomm.transformers.core.qdq¶
Attributes¶
Classes¶
General activation Quantization-DeQuantization (QDQ) module. |
|
Fixed activation Quantization-DeQuantization (QDQ) module. |
Module Contents¶
- pymllm.backends.qualcomm.transformers.core.qdq.DEFAULT_EPS_8BIT = 3.921568627450981e-07¶
- pymllm.backends.qualcomm.transformers.core.qdq.DEFAULT_EPS_16BIT = 1.5259021896696422e-09¶
- class pymllm.backends.qualcomm.transformers.core.qdq.ActivationQDQ(bits=8, qscheme=torch.per_tensor_affine)¶
Bases:
torch.nn.ModuleGeneral activation Quantization-DeQuantization (QDQ) module. Supports both Symmetric and Asymmetric (Affine) quantization. Uses torch.qint32 as a unified type to support various bit-widths.
- bits = 8¶
- qscheme¶
- dtype¶
- fake_quant¶
- forward(x)¶
- enable_observer()¶
Enable tracking of min/max values to update scale and zero_point.
- disable_observer()¶
Freeze scale and zero_point calculation.
- enable_fakequant()¶
Enable simulation of quantization error.
- disable_fakequant()¶
Disable quantization simulation (act as identity).
- extra_repr()¶
- class pymllm.backends.qualcomm.transformers.core.qdq.FixedActivationQDQ(scale, zero_point, bits=8, qscheme=torch.per_tensor_affine)¶
Bases:
torch.nn.ModuleFixed activation Quantization-DeQuantization (QDQ) module. Uses pre-determined scale and zero_point instead of dynamic observation. Supports both Symmetric and Asymmetric (Affine) quantization. Uses torch.qint32 as a unified type to support various bit-widths.
- bits = 8¶
- qscheme¶
- dtype¶
- fake_quant¶
- forward(x)¶
- enable_observer()¶
No-op: FixedActivationQDQ doesn’t use observer.
- disable_observer()¶
No-op: FixedActivationQDQ doesn’t use observer.
- enable_fakequant()¶
Enable simulation of quantization error.
- disable_fakequant()¶
Disable quantization simulation (act as identity).
- property scale¶
Get the fixed scale value.
- property zero_point¶
Get the fixed zero_point value.
- extra_repr()¶