pymllm.quantization.quant_config¶
Quantization configuration base class and registry.
This module provides the bridge between a model checkpoint’s quantization
metadata (e.g. quantize_config.json) and the runtime
LinearMethodBase instances used by
each linear layer.
Architecture overview:
quantize_config.json ──parse──► QuantizationConfig subclass
│
│ get_quant_method(layer, prefix)
▼
LinearMethodBase instance
(AWQLinearMethod, FP8LinearMethod, ...)
How to add a new quantization method¶
Create a
QuantizationConfigsubclass (e.g.AWQConfig).Implement
get_name(),from_config(),get_quant_method().Register it:
from pymllm.quantization.quant_config import register_quantization @register_quantization("awq") class AWQConfig(QuantizationConfig): ...
When the server starts with
--quantization.method awq, the loader will callget_quantization_config("awq")to obtain the config class, thenfrom_config(hf_quant_config)to instantiate it, and finallyconfig.get_quant_method(layer, prefix)for each linear layer.
Classes¶
Base class for quantization configurations. |
Functions¶
|
Class decorator that registers a |
|
Look up a registered |
Return sorted list of registered quantization method names. |
Module Contents¶
- pymllm.quantization.quant_config.register_quantization(name)¶
Class decorator that registers a
QuantizationConfigsubclass.Usage:
@register_quantization("awq") class AWQConfig(QuantizationConfig): ...
- Parameters:
name (str)
- Return type:
type[type[QuantizationConfig]]
- pymllm.quantization.quant_config.get_quantization_config(method)¶
Look up a registered
QuantizationConfigby name.Raises
KeyErrorif the method is not registered.- Parameters:
method (str)
- Return type:
Type[QuantizationConfig]
- pymllm.quantization.quant_config.list_quantization_methods()¶
Return sorted list of registered quantization method names.
- Return type:
List[str]
- class pymllm.quantization.quant_config.QuantizationConfig¶
Bases:
abc.ABCBase class for quantization configurations.
A
QuantizationConfigis instantiated once per model load. It reads quantization metadata from the checkpoint (bit-width, group size, etc.) and providesQuantizeMethodBaseinstances to each layer.Subclass contract¶
get_name()— return the method name (e.g."awq").from_config()— class method that parses a dict from the checkpoint’squantize_config.json.get_quant_method()— return the appropriateLinearMethodBase(orNoneto skip quantization for a layer).
Optional overrides¶
get_supported_act_dtypes()— restrict activation dtypes.get_min_capability()— minimum GPU compute capability.get_config_filenames()— files to probe in the checkpoint dir.
- abstractmethod get_name()¶
Return the canonical name of this quantization method.
Examples:
"awq","gptq","fp8","w8a8".- Return type:
str
- classmethod from_config(config)¶
- Abstractmethod:
- Parameters:
config (Dict[str, Any])
- Return type:
Create an instance from a checkpoint’s quantization config dict.
- Parameters:
config (Dict[str, Any]) – Parsed JSON from the checkpoint’s
quantize_config.jsonor thequantization_configsection ofconfig.json.(AWQ):: (Example config dict) –
- {
“quant_method”: “awq”, “bits”: 4, “group_size”: 128, “zero_point”: true
}
- Return type:
- abstractmethod get_quant_method(layer, prefix='')¶
Return the quantization method for layer, or
Noneto skip.- Parameters:
layer (torch.nn.Module) – The
nn.Modulebeing constructed (e.g.ColumnParallelLinear).prefix (str) – The layer’s full dotted name in the model (e.g.
"model.layers.0.self_attn.q_proj"). Can be used to selectively skip quantization for certain layers.
- Returns:
The method instance.
Nonemeans this layer should fall back to the defaultUnquantizedLinearMethod.- Return type:
QuantizeMethodBase or None
- get_supported_act_dtypes()¶
Activation dtypes supported by this method.
Override to restrict (e.g. FP8 only supports
float16). Default: no restriction.- Return type:
List[torch.dtype]
- classmethod get_min_capability()¶
Minimum CUDA compute capability (e.g. 75 for Turing).
Default: 0 (no restriction).
- Return type:
int
- static get_config_filenames()¶
File names to look for in the checkpoint directory.
Default:
["quantize_config.json"].- Return type:
List[str]