pymllm.backends.qualcomm.transformers.core.qlinear¶

Classes¶

`QLinear`
`QLinearW8A16_PerChannelSym`
`QLinearLPBQ`

Module Contents¶

class pymllm.backends.qualcomm.transformers.core.qlinear.QLinear(in_features, out_features, bias=True)¶

Bases: torch.nn.Module

in_features¶

out_features¶

weight¶

act_quant = None¶

weight_quant = None¶

deploy_mode = False¶

freeze_weight()¶: PTQ Core: Observe current weights, calculate and fix Scale/ZP

abstractmethod forward(x)¶

class pymllm.backends.qualcomm.transformers.core.qlinear.QLinearW8A16_PerChannelSym(in_features, out_features, bias=True)¶

Bases: QLinear

weight_quant¶

forward(x)¶

convert_to_deploy()¶

convert_to_conv2d_deploy_hwio()¶: Convert to deploy format with HWIO layout [1, 1, In, Out]. This format is commonly used by convolution-based inference engines.

class pymllm.backends.qualcomm.transformers.core.qlinear.QLinearLPBQ(in_features, out_features, bias=True, block_size=64)¶

Bases: QLinear

block_size¶

weight_quant¶

enable_fakequant()¶

disable_fakequant()¶

forward(x)¶

convert_to_conv2d_deploy_hwio()¶