runner¶
Classes¶
Functions¶
Module Contents¶
- runner.freeze_qwen3_rmsnorm_weight(m)¶
- runner.freeze_qwen3_linear_weight(m)¶
- runner.disable_qdq_observer(m)¶
- runner.enable_qdq_observer(m)¶
- runner.convert_weight(m)¶
- class runner.Qwen3Quantizer(model_path, mllm_qualcomm_max_length=2048)¶
- Parameters:
model_path (str)
- tokenizer¶
- model¶
- mllm_qualcomm_max_length = 2048¶
- freeze_activation()¶
- enable_activation_update()¶
- compile()¶
- infer(prompt)¶
- Parameters:
prompt (str)
- calibrate(num_samples=64, max_seq_length=512)¶
Perform calibration using Wikipedia dataset (PTQ) :param num_samples: Number of samples for calibration :param max_seq_length: Maximum length for each sample (not exceeding mllm_qualcomm_max_length)
- convert()¶