pymllm.backends.qualcomm.transformers.core.qlinear

Classes

Module Contents

class pymllm.backends.qualcomm.transformers.core.qlinear.QLinear(in_features, out_features, bias=True)

Bases: torch.nn.Module

in_features
out_features
weight
act_quant = None
weight_quant = None
deploy_mode = False
freeze_weight()

PTQ Core: Observe current weights, calculate and fix Scale/ZP

abstractmethod forward(x)
class pymllm.backends.qualcomm.transformers.core.qlinear.QLinearW8A16_PerChannelSym(in_features, out_features, bias=True)

Bases: QLinear

weight_quant
forward(x)
convert_to_deploy()
convert_to_conv2d_deploy_hwio()

Convert to deploy format with HWIO layout [1, 1, In, Out]. This format is commonly used by convolution-based inference engines.

class pymllm.backends.qualcomm.transformers.core.qlinear.QLinearLPBQ(in_features, out_features, bias=True, block_size=64)

Bases: QLinear

block_size
weight_quant
enable_fakequant()
disable_fakequant()
forward(x)
convert_to_conv2d_deploy_hwio()