pymllm.quantization.quant_config ================================ .. py:module:: pymllm.quantization.quant_config .. autoapi-nested-parse:: Quantization configuration base class and registry. This module provides the bridge between a model checkpoint's quantization metadata (e.g. ``quantize_config.json``) and the runtime :class:`~pymllm.layers.quantize_base.LinearMethodBase` instances used by each linear layer. Architecture overview:: quantize_config.json ──parse──► QuantizationConfig subclass │ │ get_quant_method(layer, prefix) ▼ LinearMethodBase instance (AWQLinearMethod, FP8LinearMethod, ...) How to add a new quantization method ------------------------------------- 1. Create a ``QuantizationConfig`` subclass (e.g. ``AWQConfig``). 2. Implement ``get_name()``, ``from_config()``, ``get_quant_method()``. 3. Register it:: from pymllm.quantization.quant_config import register_quantization @register_quantization("awq") class AWQConfig(QuantizationConfig): ... 4. When the server starts with ``--quantization.method awq``, the loader will call ``get_quantization_config("awq")`` to obtain the config class, then ``from_config(hf_quant_config)`` to instantiate it, and finally ``config.get_quant_method(layer, prefix)`` for each linear layer. Classes ------- .. autoapisummary:: pymllm.quantization.quant_config.QuantizationConfig Functions --------- .. autoapisummary:: pymllm.quantization.quant_config.register_quantization pymllm.quantization.quant_config.get_quantization_config pymllm.quantization.quant_config.list_quantization_methods Module Contents --------------- .. py:function:: register_quantization(name) Class decorator that registers a :class:`QuantizationConfig` subclass. Usage:: @register_quantization("awq") class AWQConfig(QuantizationConfig): ... .. py:function:: get_quantization_config(method) Look up a registered :class:`QuantizationConfig` by name. Raises ``KeyError`` if the method is not registered. .. py:function:: list_quantization_methods() Return sorted list of registered quantization method names. .. py:class:: QuantizationConfig Bases: :py:obj:`abc.ABC` Base class for quantization configurations. A ``QuantizationConfig`` is instantiated once per model load. It reads quantization metadata from the checkpoint (bit-width, group size, etc.) and provides :class:`~pymllm.layers.quantize_base.QuantizeMethodBase` instances to each layer. Subclass contract ----------------- * :meth:`get_name` — return the method name (e.g. ``"awq"``). * :meth:`from_config` — class method that parses a dict from the checkpoint's ``quantize_config.json``. * :meth:`get_quant_method` — return the appropriate ``LinearMethodBase`` (or ``None`` to skip quantization for a layer). Optional overrides ------------------ * :meth:`get_supported_act_dtypes` — restrict activation dtypes. * :meth:`get_min_capability` — minimum GPU compute capability. * :meth:`get_config_filenames` — files to probe in the checkpoint dir. .. py:method:: get_name() :abstractmethod: Return the canonical name of this quantization method. Examples: ``"awq"``, ``"gptq"``, ``"fp8"``, ``"w8a8"``. .. py:method:: from_config(config) :classmethod: :abstractmethod: Create an instance from a checkpoint's quantization config dict. :param config: Parsed JSON from the checkpoint's ``quantize_config.json`` or the ``quantization_config`` section of ``config.json``. :param Example config dict (AWQ)::: { "quant_method": "awq", "bits": 4, "group_size": 128, "zero_point": true } .. py:method:: get_quant_method(layer, prefix = '') :abstractmethod: Return the quantization method for *layer*, or ``None`` to skip. :param layer: The ``nn.Module`` being constructed (e.g. ``ColumnParallelLinear``). :param prefix: The layer's full dotted name in the model (e.g. ``"model.layers.0.self_attn.q_proj"``). Can be used to selectively skip quantization for certain layers. :returns: The method instance. ``None`` means this layer should fall back to the default :class:`~pymllm.layers.quantize_base.UnquantizedLinearMethod`. :rtype: QuantizeMethodBase or None .. py:method:: get_supported_act_dtypes() Activation dtypes supported by this method. Override to restrict (e.g. FP8 only supports ``float16``). Default: no restriction. .. py:method:: get_min_capability() :classmethod: Minimum CUDA compute capability (e.g. 75 for Turing). Default: 0 (no restriction). .. py:method:: get_config_filenames() :staticmethod: File names to look for in the checkpoint directory. Default: ``["quantize_config.json"]``.