pymllm.layers.linear ==================== .. py:module:: pymllm.layers.linear .. autoapi-nested-parse:: Linear layers with quantization method dispatch. Every linear layer holds a ``quant_method`` attribute (an instance of :class:`~pymllm.layers.quantize_base.LinearMethodBase`). When no quantization is configured, :class:`UnquantizedLinearMethod` is used as the default — it creates a standard FP weight and forwards via ``F.linear``. Quantized checkpoints plug in a different ``LinearMethodBase`` (e.g. ``AWQLinearMethod``) which creates packed int4 weights, scales, and zero-points, and overrides :meth:`apply` with a fused dequant+matmul kernel. Usage in model definitions:: # Non-quantized (default) layer = ColumnParallelLinear(4096, 4096) # Quantized — pass a quant_method from QuantizationConfig qm = awq_config.get_quant_method(layer, prefix="model.layers.0.q_proj") layer = ColumnParallelLinear(4096, 4096, quant_method=qm) Classes ------- .. autoapisummary:: pymllm.layers.linear.ColumnParallelLinear pymllm.layers.linear.RowParallelLinear pymllm.layers.linear.Linear Module Contents --------------- .. py:class:: ColumnParallelLinear(in_features, out_features, bias = True, gather_output = True, quant_method = None) Bases: :py:obj:`pymllm.layers.base.MllmBaseLayer` Linear layer with column parallelism (output-dimension sharding). The weight matrix is split along the output dimension across TP ranks. Each rank holds ``out_features / tp_size`` rows of the weight. :param in_features: Size of each input sample. :param out_features: Size of each output sample (before sharding). :param bias: If ``True``, adds a learnable bias. :param gather_output: If ``True``, all-gather the output across TP ranks so every rank gets the full ``out_features``. Set to ``False`` when the next layer is a :class:`RowParallelLinear` that expects a split input. :param quant_method: Quantization method instance. ``None`` → :class:`UnquantizedLinearMethod`. .. py:attribute:: tp_rank :value: 0 .. py:attribute:: tp_size :value: 1 .. py:attribute:: in_features .. py:attribute:: out_features .. py:attribute:: gather_output :value: True .. py:attribute:: out_features_per_partition .. py:attribute:: output_start_index .. py:attribute:: output_end_index .. py:attribute:: quant_method .. py:method:: weight_loader(param, loaded_weight) Load sharded weights into the parameter. :param param: The parameter to load weights into. :param loaded_weight: The weight tensor loaded from checkpoint (full size). .. py:method:: forward(x) .. py:class:: RowParallelLinear(in_features, out_features, bias = True, reduce_output = True, quant_method = None) Bases: :py:obj:`pymllm.layers.base.MllmBaseLayer` Linear layer with row parallelism (input-dimension sharding). The weight matrix is split along the input dimension across TP ranks. Each rank holds all ``out_features`` rows but only ``in_features / tp_size`` columns. Typically placed after a :class:`ColumnParallelLinear` whose ``gather_output=False``, so the input is already split. :param in_features: Size of each input sample (before sharding). :param out_features: Size of each output sample. :param bias: If ``True``, adds a learnable bias (applied after all-reduce). :param reduce_output: If ``True``, all-reduce the output across TP ranks. :param quant_method: Quantization method instance. ``None`` → :class:`UnquantizedLinearMethod`. .. py:attribute:: tp_rank :value: 0 .. py:attribute:: tp_size :value: 1 .. py:attribute:: in_features .. py:attribute:: out_features .. py:attribute:: reduce_output :value: True .. py:attribute:: in_features_per_partition .. py:attribute:: input_start_index .. py:attribute:: input_end_index .. py:attribute:: quant_method .. py:method:: weight_loader(param, loaded_weight) Load sharded weights into the parameter. :param param: The parameter to load weights into. :param loaded_weight: The weight tensor loaded from checkpoint (full size). .. py:method:: forward(x) .. py:class:: Linear(in_features, out_features, bias = True, quant_method = None) Bases: :py:obj:`pymllm.layers.base.MllmBaseLayer` Non-parallel linear layer with quantization dispatch. :param in_features: Size of each input sample. :param out_features: Size of each output sample. :param bias: If ``True``, adds a learnable bias. :param quant_method: Quantization method instance. ``None`` → :class:`UnquantizedLinearMethod`. .. py:attribute:: in_features .. py:attribute:: out_features .. py:attribute:: quant_method .. py:method:: forward(x)