pymllm.backends.qualcomm.transformers.core.embedding¶

Classes¶

Module Contents¶

class pymllm.backends.qualcomm.transformers.core.embedding.QEmbedding(num_embeddings, embedding_dim, padding_idx=None, quant_bits=16)¶

Bases: torch.nn.Module

num_embeddings¶

embedding_dim¶

padding_idx = None¶

quant_bits = 16¶

weight¶

weight_fake_quant¶

forward(x)¶

convert_to_deploy()¶: In-place replacement of self.weight: Float Parameter -> Int Buffer

freeze_weight()¶: Manually trigger Observer to observe and calculate scale, then lock it. Solve the problem of output being 0 on first run.

disable_fakequant()¶: Completely turn off quantization noise and return to floating point mode

enable_fakequant()¶

extra_repr()¶