pymllm.quantization.quant_config
================================

.. py:module:: pymllm.quantization.quant_config

.. autoapi-nested-parse::

   Quantization configuration base class and registry.

   This module provides the bridge between a model checkpoint's quantization
   metadata (e.g. ``quantize_config.json``) and the runtime
   :class:`~pymllm.layers.quantize_base.LinearMethodBase` instances used by
   each linear layer.

   Architecture overview::

       quantize_config.json   ──parse──►  QuantizationConfig subclass
                                             │
                                             │  get_quant_method(layer, prefix)
                                             ▼
                                        LinearMethodBase instance
                                         (AWQLinearMethod, FP8LinearMethod, ...)

   How to add a new quantization method
   -------------------------------------
   1. Create a ``QuantizationConfig`` subclass (e.g. ``AWQConfig``).
   2. Implement ``get_name()``, ``from_config()``, ``get_quant_method()``.
   3. Register it::

          from pymllm.quantization.quant_config import register_quantization

          @register_quantization("awq")
          class AWQConfig(QuantizationConfig):
              ...

   4. When the server starts with ``--quantization.method awq``, the loader
      will call ``get_quantization_config("awq")`` to obtain the config class,
      then ``from_config(hf_quant_config)`` to instantiate it, and finally
      ``config.get_quant_method(layer, prefix)`` for each linear layer.


Classes
-------

.. autoapisummary::

   pymllm.quantization.quant_config.QuantizationConfig


Functions
---------

.. autoapisummary::

   pymllm.quantization.quant_config.register_quantization
   pymllm.quantization.quant_config.get_quantization_config
   pymllm.quantization.quant_config.list_quantization_methods


Module Contents
---------------

.. py:function:: register_quantization(name)

   Class decorator that registers a :class:`QuantizationConfig` subclass.

   Usage::

       @register_quantization("awq")
       class AWQConfig(QuantizationConfig):
           ...


.. py:function:: get_quantization_config(method)

   Look up a registered :class:`QuantizationConfig` by name.

   Raises ``KeyError`` if the method is not registered.


.. py:function:: list_quantization_methods()

   Return sorted list of registered quantization method names.


.. py:class:: QuantizationConfig

   Bases: :py:obj:`abc.ABC`


   Base class for quantization configurations.

   A ``QuantizationConfig`` is instantiated once per model load.  It reads
   quantization metadata from the checkpoint (bit-width, group size, etc.)
   and provides :class:`~pymllm.layers.quantize_base.QuantizeMethodBase`
   instances to each layer.

   Subclass contract
   -----------------
   * :meth:`get_name` — return the method name (e.g. ``"awq"``).
   * :meth:`from_config` — class method that parses a dict from the
     checkpoint's ``quantize_config.json``.
   * :meth:`get_quant_method` — return the appropriate
     ``LinearMethodBase`` (or ``None`` to skip quantization for a layer).

   Optional overrides
   ------------------
   * :meth:`get_supported_act_dtypes` — restrict activation dtypes.
   * :meth:`get_min_capability` — minimum GPU compute capability.
   * :meth:`get_config_filenames` — files to probe in the checkpoint dir.


   .. py:method:: get_name()
      :abstractmethod:


      Return the canonical name of this quantization method.

      Examples: ``"awq"``, ``"gptq"``, ``"fp8"``, ``"w8a8"``.


   .. py:method:: from_config(config)
      :classmethod:

      :abstractmethod:


      Create an instance from a checkpoint's quantization config dict.

      :param config: Parsed JSON from the checkpoint's ``quantize_config.json`` or
                     the ``quantization_config`` section of ``config.json``.
      :param Example config dict (AWQ):::
                                          {
                                              "quant_method": "awq",
                                              "bits": 4,
                                              "group_size": 128,
                                              "zero_point": true
                                          }


   .. py:method:: get_quant_method(layer, prefix = '')
      :abstractmethod:


      Return the quantization method for *layer*, or ``None`` to skip.

      :param layer: The ``nn.Module`` being constructed (e.g. ``ColumnParallelLinear``).
      :param prefix: The layer's full dotted name in the model (e.g.
                     ``"model.layers.0.self_attn.q_proj"``).  Can be used to
                     selectively skip quantization for certain layers.

      :returns: The method instance.  ``None`` means this layer should fall back
                to the default :class:`~pymllm.layers.quantize_base.UnquantizedLinearMethod`.
      :rtype: QuantizeMethodBase or None


   .. py:method:: get_supported_act_dtypes()

      Activation dtypes supported by this method.

      Override to restrict (e.g. FP8 only supports ``float16``).
      Default: no restriction.


   .. py:method:: get_min_capability()
      :classmethod:


      Minimum CUDA compute capability (e.g. 75 for Turing).

      Default: 0 (no restriction).


   .. py:method:: get_config_filenames()
      :staticmethod:


      File names to look for in the checkpoint directory.

      Default: ``["quantize_config.json"]``.