runner

Classes

Functions

Module Contents

runner.freeze_qwen3_rmsnorm_weight(m)
runner.freeze_qwen3_linear_weight(m)
runner.disable_qdq_observer(m)
runner.enable_qdq_observer(m)
runner.convert_weight(m)
class runner.Qwen3Quantizer(model_path, mllm_qualcomm_max_length=2048)
Parameters:

model_path (str)

tokenizer
model
mllm_qualcomm_max_length = 2048
freeze_activation()
enable_activation_update()
compile()
infer(prompt)
Parameters:

prompt (str)

calibrate(num_samples=64, max_seq_length=512)

Perform calibration using Wikipedia dataset (PTQ) :param num_samples: Number of samples for calibration :param max_seq_length: Maximum length for each sample (not exceeding mllm_qualcomm_max_length)

convert()