pymllm.backends.qualcomm.transformers.core.embedding¶
Classes¶
Module Contents¶
- class pymllm.backends.qualcomm.transformers.core.embedding.QEmbedding(num_embeddings, embedding_dim, padding_idx=None, quant_bits=16)¶
Bases:
torch.nn.Module- num_embeddings¶
- embedding_dim¶
- padding_idx = None¶
- quant_bits = 16¶
- weight¶
- weight_fake_quant¶
- forward(x)¶
- convert_to_deploy()¶
In-place replacement of self.weight: Float Parameter -> Int Buffer
- freeze_weight()¶
Manually trigger Observer to observe and calculate scale, then lock it. Solve the problem of output being 0 on first run.
- disable_fakequant()¶
Completely turn off quantization noise and return to floating point mode
- enable_fakequant()¶
- extra_repr()¶