runner

Classes

Functions

recompute_scale_zp(module)

Callback function: Used to forcefully refresh scale and zero_point of all FakeQuantize modules after calibration.

validate_concat_observer_fn(module, results[, name])

Callback function: Validate that all input_observers in ConcatObserver have consistent scale and zero_point.

freeze_qwen2_rmsnorm_weight(m)

freeze_qwen2_linear_weight(m)

freeze_qwen2_embed_tokens_weight(m)

disable_qdq_observer(m)

enable_qdq_observer(m)

enable_fake_quant(m)

disable_fake_quant(m)

convert_weight(m)

Module Contents

runner.recompute_scale_zp(module)

Callback function: Used to forcefully refresh scale and zero_point of all FakeQuantize modules after calibration.

Problem solved:

When using ConcatObserver, min/max may be updated during forward pass, but at the end of forward, the scale/zp stored in FakeQuantize’s internal buffer are still computed from old min/max. This function forces a calculate_qparams call to sync the latest parameters to the buffer.

Usage:

model.apply(recompute_scale_zp)

runner.validate_concat_observer_fn(module, results, name='')

Callback function: Validate that all input_observers in ConcatObserver have consistent scale and zero_point.

Usage:

results = [] for name, m in model.named_modules():

validate_concat_observer_fn(m, results, name)

Parameters:
  • results (list)

  • name (str)

runner.freeze_qwen2_rmsnorm_weight(m)
runner.freeze_qwen2_linear_weight(m)
runner.freeze_qwen2_embed_tokens_weight(m)
runner.disable_qdq_observer(m)
runner.enable_qdq_observer(m)
runner.enable_fake_quant(m)
runner.disable_fake_quant(m)
runner.convert_weight(m)
class runner.Qwen2Quantizer(model_path, mllm_qualcomm_max_length=2048)
Parameters:

model_path (str)

tokenizer
model
mllm_qualcomm_max_length = 2048
freeze_activation()
enable_activation_update()
enable_fake_quant()
disable_fake_quant()
compile()
infer(prompt)
Parameters:

prompt (str)

calibrate(num_samples=64, max_seq_length=512)

Perform calibration using Wikipedia dataset (PTQ) :param num_samples: Number of samples for calibration :param max_seq_length: Maximum length for each sample (not exceeding mllm_qualcomm_max_length)

convert()
recompute_scale_zp()
validate_concat_observer()