pymllm.backends.qualcomm.transformers.core.embedding

Classes

Module Contents

class pymllm.backends.qualcomm.transformers.core.embedding.QEmbedding(num_embeddings, embedding_dim, padding_idx=None, quant_bits=16)

Bases: torch.nn.Module

num_embeddings
embedding_dim
padding_idx = None
quant_bits = 16
weight
weight_fake_quant
forward(x)
convert_to_deploy()

In-place replacement of self.weight: Float Parameter -> Int Buffer

freeze_weight()

Manually trigger Observer to observe and calculate scale, then lock it. Solve the problem of output being 0 on first run.

disable_fakequant()

Completely turn off quantization noise and return to floating point mode

enable_fakequant()
extra_repr()