pymllm.layers.rms_norm_gated
============================

.. py:module:: pymllm.layers.rms_norm_gated

.. autoapi-nested-parse::

   Gated RMSNorm layer for Qwen3.5 GDN attention.

   Computes ``rmsnorm(x, weight, eps) * silu(z)`` using a fused CUDA kernel
   from mllm-kernel.  Falls back to PyTorch when the kernel is unavailable.


Attributes
----------

.. autoapisummary::

   pymllm.layers.rms_norm_gated.logger


Classes
-------

.. autoapisummary::

   pymllm.layers.rms_norm_gated.RMSNormGated


Functions
---------

.. autoapisummary::

   pymllm.layers.rms_norm_gated.rms_norm_gated


Module Contents
---------------

.. py:data:: logger

.. py:function:: rms_norm_gated(x, weight, z = None, eps = 1e-06, norm_before_gate = True)

   Compute (optionally gated) RMS normalization.

   Uses the fused mllm-kernel CUDA implementation when available,
   otherwise falls back to a pure-PyTorch implementation.


.. py:class:: RMSNormGated(hidden_size, eps = 1e-06, group_size = None, norm_before_gate = True, device = None, dtype = None)

   Bases: :py:obj:`pymllm.layers.base.MllmBaseLayer`


   Gated RMS Normalization layer for Qwen3.5 GDN attention.

   Computes::

       output = rmsnorm(x, weight) * silu(z)     # z is not None
       output = rmsnorm(x, weight)                # z is None

   Uses a fused CUDA kernel from mllm-kernel for maximum throughput.

   :param hidden_size: Dimensionality of the input (and weight vector).
   :type hidden_size: int
   :param eps: Small constant for numerical stability.
   :type eps: float
   :param norm_before_gate: If ``True``  (default): ``rmsnorm(x) * silu(z)``.
                            If ``False``:            ``rmsnorm(x * silu(z))``.
   :type norm_before_gate: bool


   .. py:attribute:: hidden_size


   .. py:attribute:: eps
      :value: 1e-06


   .. py:attribute:: norm_before_gate
      :value: True


   .. py:attribute:: weight


   .. py:method:: forward(x, z = None)


   .. py:method:: extra_repr()