pymllm.layers.linear
====================

.. py:module:: pymllm.layers.linear

.. autoapi-nested-parse::

   Linear layers with quantization method dispatch.

   Every linear layer holds a ``quant_method`` attribute (an instance of
   :class:`~pymllm.layers.quantize_base.LinearMethodBase`).  When no
   quantization is configured, :class:`UnquantizedLinearMethod` is used as the
   default — it creates a standard FP weight and forwards via ``F.linear``.

   Quantized checkpoints plug in a different ``LinearMethodBase`` (e.g.
   ``AWQLinearMethod``) which creates packed int4 weights, scales, and
   zero-points, and overrides :meth:`apply` with a fused dequant+matmul kernel.

   Usage in model definitions::

       # Non-quantized (default)
       layer = ColumnParallelLinear(4096, 4096)

       # Quantized — pass a quant_method from QuantizationConfig
       qm = awq_config.get_quant_method(layer, prefix="model.layers.0.q_proj")
       layer = ColumnParallelLinear(4096, 4096, quant_method=qm)


Classes
-------

.. autoapisummary::

   pymllm.layers.linear.ColumnParallelLinear
   pymllm.layers.linear.RowParallelLinear
   pymllm.layers.linear.Linear


Module Contents
---------------

.. py:class:: ColumnParallelLinear(in_features, out_features, bias = True, gather_output = True, quant_method = None)

   Bases: :py:obj:`pymllm.layers.base.MllmBaseLayer`


   Linear layer with column parallelism (output-dimension sharding).

   The weight matrix is split along the output dimension across TP ranks.
   Each rank holds ``out_features / tp_size`` rows of the weight.

   :param in_features: Size of each input sample.
   :param out_features: Size of each output sample (before sharding).
   :param bias: If ``True``, adds a learnable bias.
   :param gather_output: If ``True``, all-gather the output across TP ranks so every rank
                         gets the full ``out_features``.  Set to ``False`` when the next
                         layer is a :class:`RowParallelLinear` that expects a split input.
   :param quant_method: Quantization method instance.  ``None`` → :class:`UnquantizedLinearMethod`.


   .. py:attribute:: tp_rank
      :value: 0


   .. py:attribute:: tp_size
      :value: 1


   .. py:attribute:: in_features


   .. py:attribute:: out_features


   .. py:attribute:: gather_output
      :value: True


   .. py:attribute:: out_features_per_partition


   .. py:attribute:: output_start_index


   .. py:attribute:: output_end_index


   .. py:attribute:: quant_method


   .. py:method:: weight_loader(param, loaded_weight)

      Load sharded weights into the parameter.

      :param param: The parameter to load weights into.
      :param loaded_weight: The weight tensor loaded from checkpoint (full size).


   .. py:method:: forward(x)


.. py:class:: RowParallelLinear(in_features, out_features, bias = True, reduce_output = True, quant_method = None)

   Bases: :py:obj:`pymllm.layers.base.MllmBaseLayer`


   Linear layer with row parallelism (input-dimension sharding).

   The weight matrix is split along the input dimension across TP ranks.
   Each rank holds all ``out_features`` rows but only
   ``in_features / tp_size`` columns.

   Typically placed after a :class:`ColumnParallelLinear` whose
   ``gather_output=False``, so the input is already split.

   :param in_features: Size of each input sample (before sharding).
   :param out_features: Size of each output sample.
   :param bias: If ``True``, adds a learnable bias (applied after all-reduce).
   :param reduce_output: If ``True``, all-reduce the output across TP ranks.
   :param quant_method: Quantization method instance.  ``None`` → :class:`UnquantizedLinearMethod`.


   .. py:attribute:: tp_rank
      :value: 0


   .. py:attribute:: tp_size
      :value: 1


   .. py:attribute:: in_features


   .. py:attribute:: out_features


   .. py:attribute:: reduce_output
      :value: True


   .. py:attribute:: in_features_per_partition


   .. py:attribute:: input_start_index


   .. py:attribute:: input_end_index


   .. py:attribute:: quant_method


   .. py:method:: weight_loader(param, loaded_weight)

      Load sharded weights into the parameter.

      :param param: The parameter to load weights into.
      :param loaded_weight: The weight tensor loaded from checkpoint (full size).


   .. py:method:: forward(x)


.. py:class:: Linear(in_features, out_features, bias = True, quant_method = None)

   Bases: :py:obj:`pymllm.layers.base.MllmBaseLayer`


   Non-parallel linear layer with quantization dispatch.

   :param in_features: Size of each input sample.
   :param out_features: Size of each output sample.
   :param bias: If ``True``, adds a learnable bias.
   :param quant_method: Quantization method instance.  ``None`` → :class:`UnquantizedLinearMethod`.


   .. py:attribute:: in_features


   .. py:attribute:: out_features


   .. py:attribute:: quant_method


   .. py:method:: forward(x)